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ARE THEORIES OF LEARNING NECESSARY? ’ 


BY B. F. SKINNER 


Harvard University 


Certain basic assumptions, essential 
to any scientific activity, are sometimes 
called theories. That nature is orderly 
rather than capricious is an example. 
Certain statements are also theories 
simply to the extent that they are not 
yet facts. A scientist may guess at the 
result of an experiment before the ex- 
periment is carried out. The prediction 
and the later statement of result may 
be composed of the same terms in the 
same syntactic arrangement, the differ- 
ence being in the degree of confidence. 
No empirical statement is wholly non- 
theoretical in this sense, because evi- 
dence is never complete, nor is any pre- 
diction probably ever made wholly with- 
out evidence. The term “theory” will 
not refer here to statements of these 
sorts but rather to any explanation of 
an observed fact which appeals to events 
taking place somewhere else, at some 
other level of observation, described in 
different terms, and measured, if at all, 
in different dimensions. 

Three types of theory in the field of 
learning satisfy this definition. The 
most characteristic is to be found in the 
field of physiological psychology. We 
are all familiar with the changes that 
are supposed to take place in the nerv- 
ous system when an organism learns. 

1 Address of the president, Midwestern Psy- 


chological Association, Chicago, Illinois, May, 
1949. 


Synaptic connections are made or 
broken, electrical fields are disrupted 
or reorganized, concentrations of ions 
are built up or allowed to diffuse away, 
and so on. In the science of neuro- 
physiology statements of this sort are 
not necessarily theories in the present 
sense. But in a science of behavior, 
where we are concerned with whether or 
not an organism secretes saliva when a 
bell rings, or jumps toward a gray tri- 
angle, or says bik when a cards reads 


- tuz, or loves someone who resembles his 


mother, all statements about the nerv- 
ous system are theories in the sense that 
they are not expressed in the same terms 
and could not be confirmed with the 
same methods of observation as the 
facts for which they are said to account. 

A second type of learning theory is in 
practice not far from the physiological, 
although there is less agreement about 
the method of direct observation. Theo- 
ries of this type have always dominated 
the field of human behavior. They con- 
sist of references to “mental” events, as 
in saying that an organism learns to be- 
have in a certain way because it “finds 
something pleasant” or because it “ex- 
pects something to happen.” To the 
mentalistic psychologist these explana- 
tory events are no more theoretical than 
synaptic connections to the neurophysi- 
ologist, but in a science of behavior 
they are theories because the methods 
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and terms appropriate to the events to 
be explained differ from the methods 
and terms appropriate to the explaining 
events. 

In a third type of learning theory the 
explanatory events are not directly ob- 
served. The writer’s suggestion that the 
letters CNS be regarded as representing, 
not the Central Nervous System, but 
the Conceptual Nervous System (2, p. 
421), seems to have been taken seri- 
ously. Many theorists point out that 
they are not talking about the nerv- 
ous system as an actual structure un- 
dergoing physiological or bio-chemical 
changes but only as a system with a 
certain dynamic output. Theories of 
this sort are multiplying fast, and so are 
parallel operational versions of mental 
events. A purely behavioral definition 
of expectancy has the advantage that 
the problem of mental observation is 
avoided and with it the problem of how 
a mental event can cause a physical one. 


But such theories do not go so far as to 
assert that the explanatory events are 
identical with the behavioral facts which 


they purport to explain. A statement 
about behavior may support such a 
theory but will never resemble it in 
terms or syntax. Postulates are good 
examples. True postulates cannot be- 
come facts. Theorems may be deduced 
from them which, as tentative state- 
ments about behavior, may or may not 
be confirmed, but theorems are not 
theories in the present sense. Postulates 
remain theories until the end. 

It is not the purpose of this paper to 
show that any of these theories cannot 
be put in good scientific order, or that 
the events to which they refer may not 
actually occur or be studied by appro- 
priate sciences. It would be foolhardy 
to deny the achievements of theories of 
this sort in the history of science. The 
question of whether they are necessary, 
however, has other implications and is 
worth asking. If the answer is no, then 
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it may be possible to argue effectively 
against theory in the field of learning. 
A science of behavior must eventually 
deal with behavior in its relation to cer- 
tain manipulable variables. Theories— 
whether neural, mental, or conceptual— 
talk about intervening steps in these re- 
lationships. But instead of prompting 
us to search for and explore relevant 
variables, they frequently have quite 
the opposite effect. When we attribute 
behavior to a neural or mental event, 
real or conceptual, we are likely to for- 
get that we still have the task of ac- 
counting for the neural or mental event. 
When we assert that an animal acts in a 
given way because it expects to receive 
food, then what began as the task of 
accounting for learned behavior becomes 
the task of accounting for expectancy. 
The problem is at least equally complex 
and probably more difficult. We are 
likely to close our eyes to it and to use 
the theory to give us answers in place of 
the answers we might find through fur- 
ther study. It might be argued that the 
principal function of learning theory to 
date has been, not to suggest appropri- 
ate research, but to create a false sense 
of security, an unwarranted satisfaction 
with the status quo. 

Research designed with respect to 
theory is also likely to be wasteful. 
That a theory generates research does 
not prove its value unless the research 
is valuable. Much useless experimenta- 
tion results from theories, and much 
energy and skill are absorbed by them. 
Most theories are eventually overthrown, 
and the greater part of the associated 
research is discarded. This could be 
justified if it were true that productive 
research requires a theory, as is, of 
course, often claimed. It is argued that 
research would be aimless and disor- 
ganized without a theory to guide it. 
The view is supported by psychological 
texts that take their cue from the logi- 
cians rather than empirical science and 
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describe thinking as necessarily involv- 
ing stages of hypothesis, deduction, ex- 
perimental test, and confirmation. But 
this is not the way most scientists actu- 
ally work. It is possible to design sig- 
nificant experiments for other reasons 
and the possibility to be examined is 
that such research will lead more di- 
rectly to the kind of information that 
a science usually accumulates. 

The alternatives are at least worth 
considering. How much can be done 
without theory? What other sorts of 
scientific activity are possible? And 
what light do alternative practices throw 
upon our present preoccupation with 
theory? 

It would be inconsistent to try to an- 
swer these questions at a_ theoretical 
level. Let us therefore turn to some 


experimental material in three areas in 
which theories of learning now flourish 
and raise the question of the function of 
theory in a more concrete fashion.’ 


The Basic Datum in Learning 


What actually happens when an or- 
ganism learns is not an easy question. 
Those who are interested in a science 
of behavior will insist that learning is a 
change in behavior, but they tend to 
avoid explicit references to responses or 
acts as such. “Learning is adjustment, 
or adaptation to a situation.” But of 
what stuff are adjustments and adapta- 
tions made? Are they data, or infer- 
ences from data? “Learning is improve- 
ment.” But improvement in what? And 
from whose point of view? ‘Learning is 
restoration of equilibrium.” But what 


2 Some of the material that follows was ob- 
tained in 1941-42 in a cooperative study on 
the behavior of the pigeon in which Keller 
Breland, Norman Guttman, and W. K. Estes 
collaborated. Some of it is selected from sub- 
sequent, as yet unpublished, work on the pi- 
geon conducted by the author at Indiana Uni- 
versity and Harvard University. Limitations 
of space make it impossible to report full de- 
tails here. 
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is in equilibrium and how is it put there? 
“Learning is problem solving.” But what 
are the physical dimensions of a problem 
—or of a solution? Definitions of this 
sort show an unwillingness to take what 
appears before the eyes in a learning 
experiment as a basic datum. Particu- 
lar observations seem too trivial. An 
error score falls; but we are not ready 
to say that this is learning rather than 
merely the result of learning. An or- 
ganism meets a criterion of ten success- 
ful trials; but an arbitrary criterion is 
at variance with our conception of the 
generality of the learning process. 

This is where theory steps in. If it 
is not the time required to get out of a 
puzzle box that changes in learning, but 
rather the strength of a bond, or the 
conductivity of a neural pathway, or 
the excitatory potential of a habit, then 
problems seem to vanish. Getting out 
of a box faster and faster is not learn- 
ing; it is merely performance. The 
learning goes on somewhere else, in a 
different dimensional system. And al- 
though the time required depends upon 
arbitrary conditions, often varies dis- 
continuously, and is subject to reversals 
of magnitude, we feel sure that the 
learning process itself is continuous, or- 
derly, and beyond the accidents of 
measurement. Nothing could better 
illustrate the use of theory as a refuge 
from the data. 

But we must eventually get back to 
an observable datum. If learning is the 
process we suppose it to be, then it must 
appear so in the situations in which we 
study it. Even if the basic process be- 
longs to some other dimensional system, 
our measures must have relevant and 
comparable properties. But productive 
experimental situations are hard to find, 
particularly if we accept certain plau- 
sible restrictions. To show an orderly 
change in the behavior of the average 
rat or ape or child is not enough, since 
learning is a process in the behavior of 
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the individual. To record the beginning 
and end of learning or a few discrete 
steps will not suffice, since a series of 
cross-sections will not give complete 
coverage of a continuous process. The 
dimensions of the change must spring 
from the behavior itself; they must not 
be imposed by an external judgment of 
success or failure or an external criterion 
of completeness. But when we review 
the literature with these requirements 
in mind, we find little justification for 
the theoretical process in which we take 
so much comfort. 

The energy level or work-output of 
behavior, for example, does not change 
in appropriate ways. In the sort of be- 
havior adapted to the Pavlovian experi- 
ment (respondent behavior) there may 
be a progressive increase in the magni- 
tude of response during learning. But 
we do not shout our responses louder 
and louder as we learn verbal material, 


nor does a rat press a lever harder and 
harder as conditioning proceeds. In 
operant behavior the energy or magni- 
tude of response changes significantly 
only when some arbitrary value is dif- 


ferentially reinforced—when such a 
change is what is learned. 

The emergence of a right response in 
competition with wrong responses is an- 
other datum frequently used in the 
study of learning. The maze and the 
discrimination box yield results which 
may be reduced to these terms. But a 
behavior-ratio of right vs. wrong can- 
not yield a continuously changing meas- 
ure in a single experiment on a single 
organism. The point at which one re- 
sponse takes precedence over another 
cannot give us the whole history of the 
change in either response. Averaging 
curves for groups of trials or organisms 
will not solve this problem. 

Increasing attention has recently been 
given to latency, the relevance of which, 
like that of energy level, is suggested by 
the properties of conditioned and uncon- 


lus. 
"likely to appear before the stimulus is 
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ditioned reflexes. But in operant be- 
havior the relation to a stimulus is dif- 
ferent. A measure of latency involves 
other considerations, as inspection of 
any case will show. Most operant re- 
sponses may be emitted in the absence 
of what is regarded as a relevant stimu- 
In such a case the response is 


presented. It is no solution to escape 
this embarrassment by locking a lever 
so that an organism cannot press it 
until the stimulus is presented, since we 
can scarcely be content with temporal 
relations that have been forced into 
compliance with our expectations. Run- 
way latencies are subject to this objec- 
tion. In a typical experiment the door 
of a starting box is opened and the time 
that elapses before a rat leaves the box 
is measured. Opening the door is not 
only a stimulus, it is a change in the 
situation that makes the response pos- 
sible for the first time. The time meas- 
ured is by no means as simple as a la- 
tency and requires another formulation. 
A great deal depends upon what the 
rat is doing at the moment the stimu- 
lus is presented. Some experimenters 
wait until the rat is facing the door, 
but to do so is to tamper with the meas- 
urement being taken. If, on the other 
hand, the door is opened without refer- 
ence to what the rat is doing, the first 
major effect is the conditioning of fa- 
vorable waiting behavior. The rat even- 
tually stays near and facing the door. 
The resulting shorter starting-time is 
not due to a reduction in the latency of 
a response, but to the conditioning of 
favorable preliminary behavior. 
Latencies in a single organism do not 
follow a simple learning process. Rele- 
vant data on this point were obtained as 
part of an extensive study of reaction 
time. A pigeon, enclosed in a box, is 
conditioned to peck at a recessed disc 
in one wall. Food is presented as rein- 
forcement by exposing a hopper through 
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a hole below the disc. If responses are 
reinforced only after a stimulus has 
been presented, responses at other times 
disappear. Very short reaction times 
are obtained by differentially reinforc- 
ing responses which occur very soon 
after the stimulus (4). But responses 
also come to be made very quickly with- 
out differential reinforcement. Inspec- 
tion shows that this is due to the de- 
velopment of effective waiting. The 
bird comes to stand before the disc with 
its head in good striking position. Un- 
der optimal conditions, without differ- 
ential reinforcement, the mean time be- 
tween stimulus and response will be of 
the order of 4% sec. This is not a true 
reflex latency, since the stimulus is dis- 
criminative rather than eliciting, but 
it is a fair example of the latency used 
in the study of learning. The point is 
that this measure does not vary con- 
tinuously or in an orderly fashion. By 
giving the bird more food, for example, 
we induce a condition in which it does 
not always respond. But the responses 
that occur show approximately the same 
temporal relation to the stimulus (Fig. 
1, middle curve). In extinction, of spe- 
cial interest here, there is a scattering 
of latencies because lack of reinforce- 
ment generates an emotional condition. 








RESPONSES 





STANDARD HUNGER 
(ExTINCTION) 











3 a 


RESPONSE TIVE IN TENTS 


197 


Some responses occur sooner and others 
are delayed, but the commonest value 
remains unchanged (bottom curve in 
Fig. 1). The longer latencies are easily 
explained by inspection. Emotional be- 
havior, of which examples will be men- 
tioned later, is likely to be in progress 
when the ready-signal is presented. It 
is often not discontinued before the 
“go” signal is presented, and the result 
is a long starting-time. Cases also be- 
gin to appear in which the bird simply 
does not respond at all during a speci- 
fied time. If we average a large num- 
ber of readings, either from one bird or 
many, we may create what looks like a 
progressive lengthening of latency. But 
the data for an individual organism do 
not show a continuous process. 
Another datum to be examined is the 
rate at which a response is emitted. 
Fortunately the story here is different. 
We study this rate by designing a situa- 
tion in which a response may be freely 
repeated, choosing a response (for ex- 
ample, touching or pressing a small lever 
or key) that may be easily observed 
and counted. The responses may be re- 
corded on a polygraph, but a more con- 
venient form is a cumulative curve from 
which rate of responding is immediately 
read as slope. The rate at which a re- 
sponse is emitted in such a situation 
comes close to our preconception of 
the learning process. As the organism 
learns, the rate rises. As it unlearns 
(for example, in extinction) the rate 
falls. Various sorts of discriminative 
stimuli may be brought into control 
of the response with corresponding 
modifications of the rate. Motivational 
changes alter the rate in a sensitive 
way. So do those events which we 
speak of as generating emotion. The 
range through which the rate varies 
significantly may be as great as of the 
order of 1000:1. Changes in rate are 
satisfactorily smooth in the individual 
case, so that it is not necessary to aver- 
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age cases. A given value is often quite 
stable: in the pigeon a rate of four or 
five thousand responses per hour may 
be maintained without interruption for 
as long as fifteen hours. 

Rate of responding appears to be the 
only datum that varies significantly and 
in the expected direction under condi- 
tions which are relevant to the “learn- 
ing process.” We may, therefore, be 
tempted to accept it as our long-sought- 
for measure of strength of bond, excita- 
tory potential, etc. Once in possession 
of an effective datum, however, we may 
feel little need for any theoretical con- 
struct of this sort. Progress in a scien- 
tific field usually waits upon the dis- 
covery of a satisfactory dependent vari- 
able. Until such a variable has been 
discovered, we resort to theory. The 
entities which have figured so promi- 
nently in learning theory have served 
mainly as substitutes for a directly ob- 
servable and productive datum. They 
have little reason to survive when such 
a datum has been found. 

It is no accident that rate of respond- 
ing is successful as a datum, because it 
is particularly appropriate to the funda- 
mental task of a science of behavior. 
If we are to predict behavior (and pos- 
sibly to control it), we must deal with 
probability of response. The business 
of a science of behavior is to evaluate 
this probability and explore the condi- 
tions that determine it. Strength of 
bond, expectancy, excitatory potential, 
and so on, carry the notion of prob- 
ability in an easily imagined form, but 
the additional properties suggested by 
these terms have hindered the search for 
suitable measures. Rate of responding 
is not a “measure” of probability but it 
is the only appropriate datum in a 
formulation in these terms. 

As other scientific disciplines can at- 
test, probabilities are not easy to han- 
dle. We wish to make statements about 
the likelihood of occurrence of a single 
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future response, but our data are in the 
form of frequencies of responses that 
have already occurred. These responses 
were presumably similar to each other 
and to the response to be predicted. 
But this raises the troublesome problem 
of response-instance vs. response-class. 
Precisely what responses are we to take 
into account in predicting a future in- 
stance? Certainly not the responses 
made by a population of different or- 
ganisms, for such a statistical datum 
raises more problems than it solves. To 
consider the frequency of repeated re- 
sponses in an individual demands some- 
thing like the experimental situation 
just described. 

This solution of the problem of a ba- 
sic datum is based upon the view that 
operant behavior is essentially an emis- 
sive phenomenon. Latency and magni- 
tude of response fail as measures be- 
cause they do not take this into ac- 
count. They are concepts appropriate 
to the field of the reflex, where the all 
but invariable control exercised by the 
eliciting stimulus makes the notion of 
probability of response trivial. Con- 
sider, for example, the case of latency. 
Because of our acquaintance with sim- 
ple reflexes we infer that a response that 
is more likely to be emitted will be 
emitted more quickly. But is this true? 
What can the word “quickly” mean? 
Probability of response, as well as pre- 
diction of response, is concerned with 
the moment of emission. This is a point 
in time, but it does not have the tem- 
poral dimension of a latency. The exe- 
cution may take time after the response 
has been initiated, but the moment of 
occurrence has no duration.* In recog- 


3 Jt cannot, in fact, be shortened or length- 
ened. Where a latency appears to be forced 
toward a minimal value by differential re- 
inforcement, another interpretation is called 
for. Although we may differentially reinforce 
more energetic behavior or the faster execu- 
tion of behavior after it begins, it is meaning- 
less to speak of differentially reinforcing re- 
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nizing the emissive character of operant 
behavior and the central position of 
probability of response as a datum, la- 
tency is seen to be irrelevant to our 
present task. 

Various objections have been made to 
the use of rate of responding as a basic 
datum. For example, such a program 
may seem to bar us from dealing with 
many events which are unique occur- 
rences in the life of the individual. A 
man does not decide upon a career, get 
married, make a million dollars, or get 
killed in an accident often enough to 
make a rate of response meaningful. 
But these activities are not responses. 
They are not simple unitary events lend- 
ing themselves to prediction as such. If 
we are to predict marriage, success, acci- 
dents, and so on, in anything more than 
statistical terms, we must deal with the 
smaller units of behavior which lead to 
and compose these unitary episodes. If 
the units appear in repeatable form, the 
present analysis may be applied. In 
the field of learning a similar objection 
takes the form of asking how the pres- 
ent analysis may be extended to experi- 
mental situations in which it is impos- 
sible to observe frequencies. It does 


sponses with short or long latencies. What 
we actually reinforce differentially arr (a) 
favorable waiting behavior and (b) more vig- 


orous responses. When we ask a subject to 
respond “as soon as possible” in the human 
reaction-time experiment, we essentially ask 
him (a) to carry out as much of the response 
as possible without actually reaching the cri- 
terion of emission, (b) to do as little else as 
possible, and (c) to respond energetically after 
the stimulus has been given. This may yield 
a minimal measurable time between stimulus 
and response, but this time is not necessarily 
a basic datum nor have our instructions al- 
tered it as such. A parallel interpretation of 
the differential reinforcement of long “laten- 
cies” is required. This is easily established by 
inspection. In the experiments with pigeons 
previously cited, preliminary behavior is con- 
ditioned that postpones the response to the 
key until the proper time. Behavior that 
“marks time” is usually conspicuous. 
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not follow that learning is not taking 
place in such situations. The notion 
of probability is usually extrapolated to 
cases in which a frequency analysis can- 
not be carried out. In the field of be- 
havior we arrange a situation in which 
frequencies are available as data, but 
we use the notion of probability in 
analyzing and formulating instances or 
even types of behavior which are not 
susceptible to this analysis. 

Another common objection is that a 
rate of response is just a set of latencies 
and hence not a new datum at all. This 
is easily shown to be wrong. When we 
measure the time elapsing between two 
responses, we are in no doubt as to what 
the organism was doing when we started 
our clock. We know that it was just 
executing a response. This is a natural 
zero—quite unlike the arbitrary point 
from which latencies are measured. The 
free repetition of a response yields a 
rhythmic or periodic datum very differ- 
ent from latency. Many periodic physi- 
cal processes suggest parallels. 

We do not choose rate of responding 
as a basic datum merely from an analy- 
sis of the fundamental task of a science 
of behavior. The ultimate appeal is to 
its success in an experimental science. 
The material which follows is offered as 
a sample of what can be done. It is 
not intended as a complete demonstra- 
tion, but it should confirm the fact 
that when we are in possession of a 
datum which varies in a significant fash- 
ion, we are less likely to resort to theo- 
retical entities carrying the notion of 
probability of response. 


Why Learning Occurs 


We may define learning as a change 
in probability of response but we must 
also specify the conditions under which 
it comes about. To do this we must 
survey some of the independent vari- 
ables of which probability of response is 
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a function. Here we meet another kind 
of learning theory. 

An effective class-room demonstration 
of the Law of Effect may be arranged 
in the following way. A pigeon, re- 
duced to 80 per cent of its ad lib weight, 
is habituated to a small, semi-circular 
amphitheatre and is fed there for sev- 
eral days from a food hopper, which the 
experimenter presents by closing a hand 
switch. The demonstration consists of 
establishing a selected response by suit- 
able reinforcement with food. For ex- 
ample, by sighting across the amphi- 
theatre at a scale on the opposite wall, 
it is possible to present the hopper 
whenever the top of the pigeon’s head 
rises above a given mark. Higher and 
higher marks are chosen until, within a 
few minutes, the pigeon is walking about 
the cage with its head held as high as 
possible. In another demonstration the 
bird is conditioned to strike a marble 


placed on the floor of the amphitheatre. 
This may be done in a few minutes by 


reinforcing successive steps. Food is 
presented first when the bird is merely 
moving near the marble, later when it 
looks down in the direction of the 
marble, later still when it moves its head 
toward the marble, and finally when it 
pecks it. Anyone who has seen such a 
demonstration knows that the Law of 
Effect is no theory. It simply specifies 
a procedure for altering the probability 
of a chosen response. 

But when we try to say why rein- 
forcement has this effect, theories arise. 
Learning is said to take place because 
the reinforcement is pleasant, satisfying, 
tension reducing, and so on. The con- 
verse process of extinction is explained 
with comparable theories. If the rate 
of responding is first raised to a high 
point by reinforcement and _ reinforce- 
ment then withheld, the response is ob- 
served to occur less and less frequently 
thereafter. One common theory ex- 
plains this by asserting that a state is 
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built up which suppresses the behavior. 
This “experimental inhibition” or “re- 
action inhibition” must be assigned to a 
different dimensional system, since noth- 
ing at the level of behavior corresponds 
to opposed processes of excitation and 
inhibition. Rate of responding is simply 
increased by one operation and de- 
creased by another. Certain effects 
commonly interpreted as showing re- 
lease from a suppressing force may be 
interpreted in other ways. Disinhibi- 
tion, for example, is not necessarily the 
uncovering of suppressed strength; it 
may be a sign of supplementary strength 
from an extraneous variable. The proc- 
ess of spontaneous recovery, often cited 
to support the notion of suppression, 
has an alternative explanation, to be 
noted in a moment. 

Let us evaluate the question of why 
learning takes place by turning again to 
some data. Since conditioning is usu- 
ally too rapid to be easily followed, the 
process of extinction will provide us 
with a more useful case. A number of 
different types of curves have been con- 
sistently obtained from rats and pigeons 
using various schedules of prior rein- 
forcement. By considering some of the 
relevant conditions we may see what 
room is left for theoretical processes. 

The mere passage of time between 
conditioning and extinction is a vari- 
able that has surprisingly little effect. 
The rat is too short-lived to make an 
extended experiment feasible, but the 
pigeon, which may live ten or fifteen 
years, is an ideal subject. More than 
five years ago, twenty pigeons were con- 
ditioned to strike a large translucent 
key upon which a complex visual pat- 
tern was projected. Reinforcement was 
contingent upon the maintenance of a 
high and steady rate of responding and 
upon striking a particular feature of the 
visual pattern. These birds were set 
aside in order to study retention. They 
were transferred to the usual living 
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quarters, where they served as breeders. 
Small groups were tested for extinction 
at the end of six months, one year, two 
years, and four years. Before the test 
each bird was transferred to a separate 
living cage. A controlled feeding sched- 
ule was used to reduce the weight to ap- 
proximately 80 per cent of the ad lib 
weight. The bird was then fed in the 
dimly lighted experimental apparatus in 
the absence of the key for several days, 
during which emotional responses to the 
apparatus disappeared. On the day of 
the test the bird was placed in the dark- 
ened box. The translucent key was 
present but not lighted. No responses 
were made. When the pattern was 


projected upon the key, all four birds 
responded quickly and extensively. Fig. 
2 shows the largest curve obtained. 
This bird struck the key within two 
seconds after presentation of a visual 
pattern that it had not seen for four 
years, and at the precise spot upon 


which differential reinforcement had 
previously been based. It continued to 
respond for the next hour, emitting 
about 700 responses. This is of the or- 
der of one-half to one-quarter of the 
responses it would have emitted if ex- 
tinction had not been delayed four 
years, but otherwise, the curve is fairly 
typical. 

Level of motivation is another vari- 
able to be taken into account. An ex- 
ample of the effect of hunger has been 
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reported elsewhere (3). The response 
of pressing a lever was established in 
eight rats with a schedule of periodic 
reinforcement. They were fed the main 
part of their ration on alternate days so 
that the rates of responding on succes- 
sive days were alternately high and low. 
Two subgroups of four rats each were 
matched on the basis of the rate main- 
tained under periodic reinforcement un- 
der these conditions. The response was 
then extinguished—in one group on. al- 
ternate days when the hunger was high, 
in the other group on alternate days 
when the hunger was low. (The same 
amount of food was eaten on the non- 
experimental days as before.) The re- 
sult is shown in Fig. 3. The upper 
graph gives the raw data. The levels of 
hunger are indicated by the points at P 
on the abscissa, the rates prevailing un- 
der periodic reinforcement. The subse- 
quent points show the decline in extinc- 
tion. If we multiply the lower curve 
through by a factor chosen to super- 
impose the points at P, the curves are 
reasonably closely superimposed, as 
shown in the lower graph. Several other 
experiments on both rats and pigeons 
have confirmed this general principle. 
If a given ratio of responding prevails 
under periodic reinforcement, the slopes 
of later extinction curves show the same 
ratio. Level of hunger determines the 
slope of the extinction curve but not its 
curvature. 








EXTINCTION 4 YEARS 
AFTER CONDITIONING 


30 





MINUTES 


Fic. 


5 
. 





B. F. SKINNER 





" Low Degree of Hunger 


——— High Deqree 


Hour 


Responses per 











a 


©} 
» 
E 
© 
Ps 
Ms 
5 
= 
= 
ae 
ci 
ae | 
ot 
2 
a 
a 
® 
=) 
€ 
3 
2 
° 
e 
- 
° 
« 
> 
° 
c 
w 
2 
ry 
a 


beo 
}> 
| 

| 
\" 
1 

, 

( 


e ' 2 3 4 





Doily Periods of One Hour Eoch 


Fic. 3 


Another variable, difficulty of re- 
sponse, is especially relevant because it 
has been used to test the theory of re- 
action inhibition (1), on the assump- 
tion that a response requiring consider- 
able energy will build up more reaction 
inhibition than an easy response and 
lead, therefore, to faster extinction. 
The theory requires that the curvature 
of the extinction curve be altered, not 
merely its slope. Yet there is evidence 
that difficulty of response acts like level 
of hunger simply to alter the slope. 
Some data have been reported but not 
published (5). A pigeon is suspended 
in a jacket which confines its wings and 


legs but leaves its head and neck free to 
respond to a key and a food magazine. 
Its behavior in this situation is quanti- 
tatively much like that of a bird moving 
freely in an experimental box. But the 
use of the jacket has the advantage that 
the response to the key may be made 
easy or difficult by changing the dis- 
tance the bird must reach. In one ex- 
periment these distances were expressed 
in seven equal but arbitrary units. At 
distance 7 the bird could barely reach 
the key, at 3 it could strike without ap- 
preciably extending its neck. Periodic 
reinforcement gave a straight base-line 
upon which it was possible to observe 
the effect of difficulty by quickly chang- 
ing position during the experimental pe- 
riod. Each of the five records in Fig. 4 
covers a fifteen minute experimental pe- 
riod under periodic reinforcement. Dis- 
tances of the bird from the key are indi- 
cated by numerals above the records. 
It will be observed that the rate of re- 
sponding at distance 7 is generally quite 
low while that at distance 3 is high. 
Intermediate distances produce _inter- 
mediate slopes. It should also be noted 
that the change from one position to 
another is felt immediately. If repeated 
responding in a difficult position were 
to build a considerable amount of re- 
action inhibition, we should expect the 
rate to be low for some little time after 
returning to an easy response. Con- 
trariwise, if an easy response were to 
build little reaction inhibition, we should 
expect a fairly high rate of responding 
for some time after a difficult position 
is assumed. Nothing like this occurs. 
The “more rapid extinction” of a diffi- 
cult response is an ambiguous expres- 
sion. The slope constant is affected and 
with it the number of responses in ex- 
tinction to a criterion, but there may 
be no effect upon curvature. 

One way of considering the question 
of why extinction curves are curved is 
to regard extinction as a process of ex- 
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haustion comparable to the loss of heat 
from source to sink or the fall in the 
level of a reservoir when an outlet is 
opened. Conditioning builds up a pre- 
disposition to respond—a “reserve” — 
which extinction exhausts. This is per- 
haps a defensible description at the level 
of behavior. The reserve is not neces- 
sarily a theory in the present sense, 
since it is not assigned to a different di- 
mensional system. It could be opera- 
tionally defined as a predicted extinc- 
tion curve, even though, linguistically, it 
makes a statement about the momentary 
condition of a response. But it is not a 
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particularly useful concept, nor does the 
view that extinction is a process of ex- 
haustion add much to the observed fact 
that extinction curves are curved in a 
certain way. 

There are, however, two variables 
that affect the rate, both of which oper- 
ate during extinction to alter the curva- 
ture. One of these falls within the field 
of emotion. When we fail to reinforce 
a response that has previously been re- 
inforced, we not only initiate a process 
of extinction, we set up an emotional 
response—perhaps what is often meant 
by frustration. The pigeon coos in an 
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identifiable pattern, moves rapidly about 
the cage, defecates, or flaps its wings 
rapidly in a squatting position that sug- 
gests treading (mating) behavior. This 
competes with the response of striking 
a key and is perhaps enough to account 
for the decline in rate in early extinc- 
tion. It is also possible that the prob- 
ability of a response based upon food 
deprivation is directly reduced as part 
of such an emotional reaction. What- 
ever its nature, the effect of this vari- 
able is eliminated through adaptation. 
Repeated extinction curves become 
smoother, and in some of the schedules 
to be described shortly there is little or 
no evidence of an emotional modifica- 
tion of rate. 

A second variable has a much more 
serious effect. Maximal responding dur- 
ing extinction is obtained only when the 
conditions under which the response was 
reinforced are precisely reproduced. A 
rat conditioned in the presence of a 


light will not extinguish fully in the ab- 


sence of the light. It will begin to re- 
spond more rapidly when the light is 
again introduced. This is true for other 
kinds of stimuli, as the following class- 
room experiment illustrates. Nine pi- 
geons were conditioned to strike a yel- 
low triangle under intermittent reinforce- 
ment. In the session represented by 
Fig. 5 the birds were first reinforced on 
this schedule for 30 minutes. The com- 
bined cumulative curve is essentially a 
straight line, showing more than 1100 
responses per bird during this period. 
A red triangle was then substituted for 
the yellow and no responses were rein- 
forced thereafter. The effect was a 
sharp drop in responding, with only a 
slight recovery during the next fifteen 
minutes. When the yellow triangle was 
replaced, rapid responding began im- 
mediately and the usual extinction curve 
followed. Similar experiments have 
shown that the pitch of an incidental 
tone, the shape of a pattern being 
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struck, or the size of a pattern, if pres- 
ent during conditioning, will to some 
extent control the rate of responding 
during extinction. Some properties are 
more effective than others, and a quan- 
titative evaluation is possible. By 
changing to several values of a stimu- 
lus in random order repeatedly during 
the extinction process, the gradient for 
stimulus generalization may be read 
directly in the rates of responding un- 
der each value. 

Something very much like this must 
go on during extinction. Let us suppose 
that all responses to a key have been 
reinforced and that each has been fol- 
lowed by a short period of eating. 
When we extinguish the behavior, we 
create a situation in which responses 
are not reinforced, in which no eating 
takes place, and in which there are 
probably new emotional responses. The 
situation could easily be as novel as a 
red triangle after a yellow. If so, it 
could explain the decline in rate during 
extinction. We might have obtained a 
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smooth curve, shaped like an extinction 
curve, between the vertical lines in Fig. 
5 by gradually changing the color of the 
triangle from yellow to red. This might 
have happened even though no other 
sort of extinction were taking place. 
The very conditions of extinction seem 
to presuppose a growing novelty in the 
experimental situation. Is this why the 
extinction curve is curved? 

Some evidence comes from the data 
of “spontaneous recovery.” Even after 
prolonged extinction an organism will 
often respond at a higher rate for at 
least a few moments at the beginning of 
another session. One theory contends 
that this shows spontaneous recovery 
from some sort of inhibition, but an- 
other explanation is possible. No mat- 


ter how carefully an animal is handled, 
the stimulation coincident with the be- 
ginning of an experiment must be ex- 
tensive and unlike anything occurring 
in the later part of an experimental pe- 


riod. Responses have been reinforced 
in the presence of, or shortly following, 
the organism is again placed in the ex- 
perimental situation, the stimulation is 
this stimulation. In extinction it is 
present for only a few moments. When 
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restored; further responses are emitted 
as in the case of the yellow triangle. 
The only way to achieve full extinction 
in the presence of the stimulation of 
starting an experiment is to start the ex- 
periment repeatedly. 

Other evidence of the effect of nov- 
elty comes from the study of periodic 
reinforcement. The fact that intermit- 
tent reinforcement produces bigger ex- 
tinction curves than continuous rein- 
forcement is a troublesome difficulty 
for those who expect a simple relation 
between number of reinforcements and 
number of responses in extinction. But 
this relation is actually quite complex. 
One result of periodic reinforcement is 
that emotional changes adapt out. This 
may be responsible for the smoothness 
of subsequent extinction curves but 
probably not for their greater extent. 
The latter may be attributed to the lack 
of novelty in the extinction situation. 
Under periodic reinforcement many re- 
sponses are made without reinforcement 
and when no eating has recently taken 
place. The situation in extinction is 
therefore not wholly novel. 

Periodic reinforcement is not, how- 
ever, a simple solution. If we reinforce 
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on a regular schedule—say, every min- 
ute—the organism soon forms a dis- 
crimination. Little or no responding 
occurs just after reinforcement, since 
stimulation from eating is correlated 
with absence of subsequent reinforce- 
ment. How rapidly the discrimination 


may develop is shown in Fig. 6, which 
reproduces the first five curves obtained 
from a pigeon under periodic reinforce- 
ment in experimental periods of fifteen 


minutes each. In the fifth period (or 
after about one hour of periodic rein- 
forcement) the discrimination yields a 
pause after each reinforcement, result- 
ing in a markedly stepwise curve. As 
a result of this discrimination the bird 
always responding rapidly 
when reinforced. This is the basis for 
another discrimination. Rapid respond- 
ing becomes a favorable stimulating 
condition. A good example of the ef- 
fect upon the subsequent extinction 
curve is shown in Fig. 7. This pigeon 
had been reinforced once every minute 
during daily experimental periods of 
fifteen minutes each for several weeks. 
In the extinction curve shown, the bird 
begins to respond at the rate prevailing 
under the preceding schedule. A quick 
positive acceleration at the start is lost 
in the record. The 


is almost 


reduction of the 


pigeon quickly reaches and sustains a 
rate that is higher than the overall-rate 
during periodic reinforcement. During 
this period the pigeon creates a stimu- 
lating condition previously optimally 
correlated with reinforcement. Even- 
tually, as some sort of exhaustion inter- 
venes, the rate falls off rapidly to a 
much lower but fairly stable value and 
then to practically zero. A condition 
then prevails under which a response is 
not normally reinforced. The bird is 
therefore not likely to begin to respond 
again. When it does respond, however, 
the situation is slightly improved and, 
if it continues to respond, the condi- 
tions rapidly become similar to those 
under which reinforcement has been re- 
ceived. Under this “autocatalysis” a 
high rate is quickly reached, and more 
than 500 responses are emitted in a 
second burst. The rate then declines 
quickly and fairly smoothly, again to 
nearly zero. This curve is not by any 
means disorderly. Most of the ‘curva- 
ture is smooth. But the burst of re- 
sponding at forty-five minutes shows 
a considerable residual strength which, 
if extinction were merely exhaustion, 
should have appeared earlier in the 
curve. The curve may be reasonably 
accounted for by assuming that the 
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bird is largely controlled by the preced- 
ing spurious correlation between rein- 
forcement and rapid responding. 

This assumption may be checked by 
constructing a schedule of reinforce- 
ment in which a differential contingency 
between rate of responding and rein- 
forcement is impossible. In one such 
schedule of what may be called “aperi- 
odic reinforcement” one interval be- 
tween successive reinforced responses is 
so short that no unreinforced responses 
intervene while the longest interval is 
about two minutes. Other intervals are 
distributed arithmetically between these 
values, the average remaining one min- 
ute. The intervals are roughly random- 
ized to compose a program of reinforce- 
ment. Under this program the prob- 
ability of reinforcement does not change 
with respect to previous reinforcements, 
and the curves never acquire the step- 
wise character of curve E in Fig. 6. 
(Fig. 9 shows curves from a similar 
program.) As a result no correlation 


between different rates of responding 
and different probabilities of reinforce- 
ment can develop. 

An extinction curve following a brief 
exposure to aperiodic reinforcement is 


shown in Fig. 8. It begins character- 
istically at the rate prevailing under 
aperiodic reinforcement and, unlike the 
curve following regular periodic rein- 
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forcement, does not accelerate to a 
higher overall rate. There is no evi- 
dence of the “autocatalytic” production 
of an optimal stimulating condition. 
Also characteristically, there are no 
significant discontinuities or sudden 
changes in rate in either direction. The 
curve extends over a period of eight 
hours, as against not quite two hours 
in Fig. 7, and seems to represent a 
single orderly proces:. The total num- 
ber of responses is higher, perhaps be- 
cause of the greater time allowed for 
emission. All of this can be explained 
by the single fact that we have made it 
impossible for the pigeon to form a pair 
of discriminations based, first, upon 
stimulation from eating and, second, 
upon stimulation from rapid responding. 

Since the longest interval between re- 
inforcement was only two minutes, a 
certain novelty must still have been in- 
troduced as time passed. Whether this 
explains the curvature in Fig. 8 may be 
tested to some extent with other pro- 
grams of reinforcement containing much 
longer intervals. A geometric progres- 
sion was constructed by beginning with 
10 seconds as the shortest interval and 
repeatedly multiplying through by a 
ratio of 1.54. This yielded a set of 
intervals averaging 5 minutes, the long- 
est of which was more than 21 minutes. 
Such a set was randomized in a program 
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of reinforcement repeated every hour. 
In changing to this program from the 
arithmetic series, the rates first declined 
during the longer intervals, but the pi- 
geons were soon able to sustain a con- 
stant rate of responding under it. Two 
records in the form in which they were 
recorded are shown in Fig. 9. (The 
pen resets to zero after every thousand 
responses. In order to obtain a single 
cumulative curve it would be necessary 
to cut the record and to piece the sec- 
tions together to yield a continuous line. 
The raw form may be reproduced with 
less reduction.) Each reinforcement is 
represented by a horizontal dash. The 
time covered is about 3 hours. Records 
are shown for two pigeons that main- 
tained different overall rates under this 
program of reinforcement. 

Under such a schedule a constant rate 
of responding is sustained for at least 
21 minutes without reinforcement, after 
which a reinforcement is received. Less 
novelty should therefore develop during 
succeeding extinction. In Curve 1 of 
Fig. 10 the pigeon had been exposed to 
several sessions of several hours each 
with this geometric set of intervals. 
The number of responses emitted in ex- 
tinction is about twice that of the curve 
in Fig. 8 after the arithmetic set of in- 


tervals averaging one minute, but the 
curves are otherwise much alike. Fur- 
ther exposure to the geometric sched- 
ule builds up longer runs during which 
the rate does not change significantly. 
Curve 2 followed Curve 1 after two and 
one-half hours of further aperiodic re- 
inforcement. On the day shown in 


Curve 2 a few aperiodic reinforcements 
were first given, as marked at the be- 


ginning of the curve. When reinforce- 
ment was discontinued, a fairly con- 
stant rate of responding prevailed for 
several thousand responses. After an- 
other experimental session of two and 
one-half hours with the geometric se- 
ries, Curve 3 was recorded. This ses- 
sion also began with a short series of 
aperiodic reinforcements, followed by a 
sustained run of more than 6000 unrein- 
forced responses with little change in 
rate (A). There seems to be no reason 
why other series averaging perhaps more 
than five minutes per interval and con- 
taining much longer exceptional inter- 
vals would not carry such a straight 
line much further. 

In this attack upon the problem of ex- 
tinction we create a schedule of rein- 
forcement which is so much like the 
conditions that will prevail during ex- 
tinction that no decline in rate takes 
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place for a long time. In other words 
we generate extinction with no curva- 
ture. Eventually some kind of exhaus- 
tion sets in, but it is not approached 
gradually. The last part of Curve 3 
(unfortunately much reduced in the fig- 
ure) may possibly suggest exhaustion in 
the slight overall curvature, but it is a 
small part of the whole process. The 
record is composed mainly of runs of a 
few hundred responses each, most of 
them at approximately the same rate as 
that maintained under periodic rein- 
forcement. The pigeon stops abruptly; 
when it starts to respond again, it 
quickly reaches the rate of responding 
under which it was reinforced. This 


recalls the spurious correlation between 
rapid responding and reinforcement un- 
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der regular reinforcement. We have 
not, of course, entirely eliminated this 
correlation. Even though there is no 
longer a differential reinforcement of 
high against low rates, practically all 
reinforcements have occurred under a 
constant rate of responding. 

Further study of reinforcing sched- 
ules may or may not answer the ques- 
tion of whether the novelty appearing in 
the extinction situation is entirely re- 
sponsible for the curvature. It would 
appear to be necessary to make the 
conditions prevailing during extinction 
identical with the conditions prevailing 
during conditioning. This may be im- 
possible, but in that case the question is 
academic. The hypothesis, meanwhile, 
is not a theory in the present sense, 
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since it makes no statements about a 
parallel process in any other universe of 
discourse.* 

The study of extinction after differ- 
ent schedules of aperiodic reinforcement 
is not addressed wholly to this hypothe- 
sis. The object is an economical de- 
scription of the conditions prevailing 
during reinforcement and extinction and 
of the relations between them. In us- 
ing rate of responding as a basic datum 


we may appeal to conditions that are 


observable and manipulable and we may 
express the relations between them in 
objective terms. To the extent that our 
datum makes this possible, it reduces 
the need for theory. When we observe 
a pigeon emitting 7000 responses at a 
constant rate without reinforcement, we 
are not likely to explain an extinction 
curve containing perhaps a few hundred 
responses by appeal to the piling up of 
reaction inhibition or any other fatigue 
product. Research which is conducted 
without commitment to theory is more 
likely to carry the study of extinction 
into new areas and new orders of 
magnitude. By hastening the accumu- 
lation of data, we speed the departure 
of theories. If the theories have played 
no part in the design of our experi- 
ments, we need not be sorry to see 
them go. 


Complex Learning 


A third type of learning theory is 


illustrated by terms like preferring, 
choosing, discriminating, and matching. 
An effort may be made to define these 
solely in terms of behavior, but in tradi- 
tional practice they refer to processes 
in another dimensional system. A re- 


*It is true that it appeals to stimulation 
generated in part by the pigeon’s own be- 
havior. This may be difficult to specify or 
manipulate, but it is not theoretical in the 
present sense. So long as we are willing to 
assume a one-to-one correspondence between 
action and stimulation, a physical specification 
is possible. 
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sponse to one of two available stimuli 
may be called choice, but it is com- 
moner to say that it is the result of 
choice, meaning by the latter a theo- 
retical pre-behavioral activity. The 
higher mental processes are the best 
examples of theories of this sort; neuro- 
logical parallels have not been well 
worked out. The appeal to theory is 
encouraged by the fact that choosing 
(like discriminating, matching, and so 
on) is not a particular piece of behavior. 
It is not a response or an act with speci- 
fied topography. The term character- 
izes a larger segment of behavior in 
relation to other variables or events. 
Can we formulate and study the behav- 
ior to which these terms would usually 
be applied without recourse to the theo- 
ries which generally accompany them? 

Discrimination is a relatively simple 
case. Suppose we find that the prob- 
ability of emission of a given response 
is not significantly affected by chang- 
ing from one of two stimuli to the other. 
We then make reinforcement of the re- 
sponse contingent upon the presence of 
one of them. The well-established re- 
sult is that the probability of response 
remains high under this stimulus and 
reaches a very low point under the 
other. We say that the organism now 
discriminates between the stimuli. But 
discrimination is not itself an action, or 
necessarily even a unique process. Prob- 
lems in the field of discrimination may 
be stated in other terms. How much 
induction obtains between stimuli of 
different magnitudes or classes? What 
are the smallest differences in stimuli 
that yield a difference in control? And 
so on. Questions of this sort do not 
presuppose theoretical activities in other 
dimensional systems. 

A somewhat larger segment must be 
specified in dealing with the behavior of 
choosing one of two concurrent stimuli. 
This has been studied in the pigeon by 
examining responses to two keys differ- 
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ing in position (right or left) or in some 
property like color randomized with re- 
spect to position. By occasionally rein- 
forcing a response on one key or the 
other without favoring either key, we 
obtain equal rates of responding on the 
two keys. The behavior approaches a 
simple alternation from one key to the 
other. This follows the rule that tend- 
encies to respond eventually correspond 
to the probabilities of reinforcement. 
Given a system in which one key or the 
other is occasionally connected with the 
magazine by an external clock, then if 
the right key has just been struck, the 
probability of reinforcement via the left 
key is higher than that via the right 
since a greater interval of time has 
elapsed during which the clock may 
have closed the circuit to the left key. 
But the bird’s behavior does not cor- 
respond to this probability merely out 
of respect for mathematics. The spe- 


cific result of such a contingency of 


reinforcement is that changing-to-the- 
other-key-and-striking is more often re- 
inforced than striking-the-same-key-a- 
second-time. We are no longer dealing 
with just two responses. In order to 
analyze “choice” we must consider a 
single final response, striking, without 
respect to the position or color of the 
key, and in addition the responses of 
changing from one key or color to the 
other. 

Quantitative results are compatible 
with this analysis. If we periodically 
reinforce responses to the right key 
only, the rate of responding on the 
right will rise while that on the left will 
fall. The response of changing-from- 
right-to-left is never reinforced while 
the response of changing-from-left-to- 
right is occasionally so. When the bird 
is striking on the right, there is no great 
tendency to change keys; when it is 
striking on the left, there is a strong 
tendency to change. Many more re- 
sponses come to be made to the right 
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key. The need for considering the be- 
havior of changing over is clearly shown 
if we now reverse these conditions and 
reinforce responses to the left key only. 
The ultimate result is a high rate of re- 
sponding on the left key and a low rate 
on the right. By reversing the condi- 
tions again the high rate can be shifted 
back to the right key. In Fig. 11 a 
group of eight curves have been aver- 
aged to follow this change during six 
experimental periods of 45 minutes 
each. Beginning on the second day in 
the graph responses to the right key 
(R®) decline in extinction while re- 
sponses to the left key (R") increase 
through periodic reinforcement. The 
mean rate shows no significant varia- 
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tion, since periodic reinforcement is con- 


tinued on the same schedule. The mean 
rate shows the condition of strength of 
the response of striking a key regard- 
less of position. The distribution of re- 
sponses between right and left depends 
upon the relative strength of the re- 
sponses of changing over. If this were 
simply a case of the extinction of one 
response and the concurrent recondi- 
tioning of another, the mean curve 
would not remain approximately hori- 
zontal since reconditioning occurs much 
more rapidly than extinction.° 

The rate with which the bird changes 
from one key to the other depends upon 
the distance between the keys. This dis- 
tance is a rough measure of the stimu- 
lus-difference between the two keys. It 


5 Two topographically independent responses, 
capable of emission at the same time and hence 
not requiring change-over, show separate proc- 
esses of reconditioning and extinction, and the 
combined rate of responding varies 


also determines the scope of the re- 
sponse of changing-over, with an im- 
plied difference in sensory feed-back. 
It also modifies the spread of reinforce- 
ment to responses supposedly not rein- 
forced, since if the keys are close to- 
gether, a response reinforced on one 
side may occur sooner after a preceding 
response on the other side. In Fig. 11 
the two keys were about one inch apart. 
They were therefore fairly similar with 
respect to position in the experimental 
box. Changing from one to the other 
involved a minimum of sensory feed- 
back, and reinforcement of a response 
to one key could follow very shortly 
upon a response to the other. When 
the keys are separated by as much as 
four inches, the change in strength is 
much more rapid. Fig. 12 shows two 
curves recorded simultaneously from a 
single pigeon during one experimental 
period of about 40 minutes. A high rate 
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to the right key and a low rate to the 
left had previously been established. In 
the figure no responses to the right were 
reinforced, but those to the left were re- 
inforced every minute as indicated by 
the vertical dashes above curve L. The 
slope of R declines in a fairly smooth 
fashion while that of L increases, also 
fairly smoothly, to a value comparable 
to the initial value of R. The bird has 
conformed to the changed contingency 
within a single experimental period. 
The mean rate of responding is shown 
by a dotted line, which again shows no 
significant curvature. 

What is called “preference” enters 
into this formulation. At any stage of 
the process shown in Fig. 12 preference 
might be expressed in terms of the rela- 
tive rates of responding to the two keys. 
This preference, however, is not in strik- 
ing a key but in changing from one key 
to the other. The probability that the 
bird will strike a key regardless of its 


identifying properties behaves independ- 
ently of the preferential response of 
changing from one key to the other. 
Several experiments have revealed an 


additional fact. A preference remains 
fixed if reinforcement is withheld. Fig. 
13 is an example. It shows simultane- 
ous extinction curves from two keys 
during seven daily experimental periods 
of one hour each. Prior to extinction 
the relative strength of the responses 
of changing-to-R and changing-to-L 
yielded a “preference” of about 3 to 1 
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for R. The constancy of the rate 
throughout the process of extinction has 
been shown in the figure by multiply- 
ing L through by a suitable constant 
and entering the points as small circles 
on R. If extinction altered the prefer- 
ence, the two curves could not be super- 
imposed in this way. 

These formulations of discrimination 
and choosing enable us to deal with 
what is generally regarded as a much 
more complex process—matching to 
sample. Suppose we arrange three 
translucent keys, each of which may be 
illuminated with red or green light. The 
middle key functions as the sample and 
we color it either red or green in ran- 
dom order. We color the two side keys 
one red and one green, also in random 
order. The “problem” is to strike the 
side key which corresponds in color to 
the middle key. There are only four 
three-key patterns in such a case, and 
it is possible that a pigeon could learn 
to make an appropriate response to each 
pattern. This does not happen, at least 
within the temporal span of the experi- 
ments to date. If we simply present a 
series of settings of the three colors and 
reinforce successful responses, the pi- 
geon will strike the side keys without 
respect to color or pattern and be rein- 
forced 50 per cent of the time. This is, 
in effect, a schedule of “fixed ratio” re- 
inforcement which is adequate to main- 
tain a high rate of responding. 

Nevertheless it is possible to get a 
pigeon to match to sample by rein- 
forcing the discriminative responses 
of  striking-red-after-being-stimulated- 
by-red and _ striking-green-after-being- 
stimulated-by-green while extinguishing 
the other two possibilities. The diffi- 
culty is in arranging the proper stimu- 
lation at the time of the response. The 
sample might be made conspicuous— 
for example, by having the sample color 
in the general illumination of the ex- 
perimental box. In such a case the pi- 
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geon would learn to strike red keys in 
a red light and green keys in a green 
light (assuming a neutral illumination 
of the background of the keys). But 
a procedure which holds more closely 
to the notion of matching is to induce 
the pigeon to “look at the sample” by 
means of a separate reinforcement. We 
may do this by presenting the color on 
the middle key first, leaving the side 
keys uncolored. A response to the mid- 
dle key is then reinforced (secondarily) 
by illuminating the side keys. The pi- 
geon learns to make two responses in 
quick succession—to the middle key and 
then to one side key. The response to 
the side key follows quickly upon the 
visual stimulation from the middle key, 
which is the requisite condition for a 
discrimination. Successful matching was 
readily established in all ten pigeons 
tested with this technique. Choosing 


the opposite is also easily set up. The 


discriminative response of striking-red- 
after-being-stimulated-by-red is appar- 
ently no easier to establish than strik- 
ing-red-after-being-stimulated-by-green. 
When the response is to a key of the 
same color, however, generalization may 
make it possible for the bird to match a 
new color. This is an extension of the 
notion of matching that has not yet 
been studied with this method. 

Even when matching behavior has 
been well established, the bird will not 
respond correctly if all three keys are 
now presented at the same time. The 
bird does not possess strong behavior 
of looking at the sample. The experi- 
menter must maintain a separate rein- 
forcement to keep this behavior in 
strength. In monkeys, apes, and hu- 
man subjects the ultimate success in 
choosing is apparently sufficient to re- 
inforce and maintain the behavior of 
looking at the sample. It is possible 
that this species difference is simply a 
difference in the temporal relations re- 
quired for reinforcement. 
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The behavior of matching survives 
unchanged when all reinforcement is 
withheld. An intermediate case has 
been established in which the correct 
matching response is only periodically 
reinforced. In one experiment one color 
appeared on the middle key for one 
minute; it was then changed or not 
changed, at random, to the other color. 
A response to this key illuminated the 
side keys, one red and one green, in 
random order. A response to a side 
key cut off the illumination to both side 
keys, until the middle key had again 
been struck. The apparatus recorded 
all matching responses on one graph 
and all non-matching on another. Pi- 
geons which have acquired matching 
behavior under continuous reinforce- 
ment have maintained this behavior 
when reinforced no oftener than once 
per minute on the average. They may 
make thousands of matching responses 
per hour while being reinforced for no 
more than sixty of them. This sched- 
ule will not nécessarily develop match- 
ing behavior in a naive bird, for the 
problem can be solved in three ways. 
The bird will receive practically as 
many reinforcements if it responds to 
(1) only one key or (2) only one color, 
since the programming of the experi- 
ment makes any persistent response 
eventually the correct one. 

A sample of the data obtained in a 
complex experiment of this sort is given 
in Fig. 14. Although this pigeon had 
learned to match color under continuous 
reinforcement, it changed to the spuri- 
ous solution of a color preference under 
periodic reinforcement. Whenever the 
sample was red, it struck both the sam- 
ple and the red side key and received all 
reinforcements. When the sample was 
green, it did not respond and the side 
keys were not illuminated. The result 
shown at the beginning of the graph in 
Fig. 14 is a high rate of responding on 
the upper graph, which records match- 
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ing responses. (The record is actually 
step-wise, following the presence or ab- 
sence of the red sample, but this is lost 
in the reduction in the figure.) A color 
preference, however, is not a solution to 
the problem of opposites. By chang- 
ing to this problem, it was possible to 
change the bird’s behavior as shown be- 
tween the two vertical lines in the fig- 
ure. The upper curve between these 
lines shows the decline in matching re- 
sponses which had resulted from the 
color preference. The lower curve be- 
tween the same lines shows the develop- 
ment of responding to and matching the 
opposite color. At the second vertical 
line the reinforcement was again made 
contingent upon matching. The upper 
curve shows the reestablishment of 
matching behavior while the lower curve 
shows a decline in striking the opposite 
color. The result was a true solution: 
the pigeon struck the sample, no mat- 
ter what its color, and then the corre- 
sponding side key. The lighter line 
connects the means of a series of points 
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on the two curves. It seems to follow 
the same rule as in the case of choos- 
ing: changes in the distribution of re- 
sponses between two keys do not in- 
volve the over-all rate of responding to 
a key. This mean rate will not remain 
constant under the spurious solution 
achieved with a color preference, as at 
the beginning of this figure. 

These experiments on a few higher 
processes have necessarily been very 
briefly described. They are not of- 
fered as proving that theories of learn- 
ing are not necessary, but they may 
suggest an alternative program in this 
difficult area. The data in the field of 
the higher mental processes transcend 
single responses or single stimulus-re- 
sponse relationships. But they appear 
to be susceptible to formulation in terms 
of the differentiation of concurrent re- 
sponses, the discrimination of stimuli, 
the establishment of various sequences 
of responses, and so on. There seems 
to be no @ priori reason why a complete 
account is not possible without appeal 
to theoretical processes in other dimen- 
sioua! systems. 


Conclusion 


Perhaps to do without theories alto- 
gether is a tour de force that is too 
much to expect as a general practice. 


Theories are fun. But it is possible 
that the most rapid progress toward an 
understanding of learning may be made 
by research that is not designed to test 
theories. An adequate impetus is sup- 
plied by the inclination to obtain data 
showing orderly changes characteristic 
of the learning process. An acceptable 
scientific program is to collect data of 
this sort and to relate them to ma- 
nipulable variables, selected for study 
through a common sense exploration of 
the field. 

This does not exclude the possibility 
of theory in another sense. Beyond the 
collection of uniform relationships lies 
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the need for a formal representation of 
the data reduced to a minimal number 
of terms. A theoretical construction 
may yield greater generality than any 
assemblage of facts. But such a con- 
Struction will not refer to another di- 
mensional system and will not, there- 
fore, fall within our present definition. 
It will not stand in the way of our 
search for functional relations because 
it will arise only after relevant variables 
have been found and studied. Though 
it may be difficult to understand, it will 
not be easily misunderstood, and it will 
have none of the objectionable effects of 
the theories here considered. 

We do not seem to be ready for 
theory in this sense. At the moment 
we make little effective use of empirical, 
let alone rational, equations. A few of 
the present curves could have been 
fairly closely fitted. But the most ele- 


mentary preliminary research shows that 
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there are many relevant variables, and 
until their importance has been experi- 
mentally determined, an equation that 
allows for them will have so many arbi- 
trary constants that a good fit will be a 
matter of course and a cause for very 
little satisfaction. 
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AN OPERATIONAL APPROACH TO SOME PROBLEMS 
IN PSYCHOLOGICAL MEASUREMENT 


BY ANDREW L. COMREY 
The University of Illinois 


It is often stated that a science be- 
comes more mathematical as it matures. 
Psychology is certainly no exception to 
this rule, for the past few decades have 
seen a tremendous development of this 
field as a quantitative science. Nu- 
merical methods of treating data have 
been expanded to such a point that 
scarcely any area of psychology has re- 
mained untouched. This trend has not 
developed entirely without opposition or 
criticism, however, for there have been 
many who believe that psychological 
phenomena cannot be reduced to quan- 
titative terms. Others have pointed out 
real or imagined flaws in the techniques 
that psychologists have employed. Some 
of the criticisms leveled against the use 
of numerical description in psychology 
have been justified, for quantitative 
treatment often has been incorrectly ap- 
plied. Other criticisms, however, have 
been leveled at the philosophy behind 
psychological measurement. It has been 
suggested that psychology is limited to 
a rank-order type of measurement be- 
cause it fails to meet the requirements 
laid down for the measurement of such 
magnitudes as length, mass, and so on. 
The purpose of this paper will be to dis- 
cuss a few such criticisms in the light of 
the traditional theory of measurement 
and to offer an interpretation which 
should lend some much needed support 
to the basic rationale behind psycho- 
logical measurement. 


THE TRADITIONAL THEORY OF 
MEASUREMENT 


The works of Campbell (4, 5, 6) are 
probably the most often quoted sources 
on the requirements for measurement. 


For this reason, his treatment of the 
traditional point of view will be taken 
as typical. Campbell states three con- 
ditions which he considers essential to 
measurement. His First Condition of 
Measurement is stated as follows (6, 


p. 6): 


The systems measured must, in virtue of 
the property concerned, be the field of a 
pair of converse T.A. [transitive asym- 
metrical] relations and of the T\S. [transi- 
tive symmetrical] relation associated with 
them; every system must be either > or < 
or =every other, and must be = at least 
one other. 


This statement is approximately the 
same as later statements of the require- 
ments for ordinal measurement, except 
for the phrase “every system must be 
. . .=at least one other.” Some atten- 
tion will be given to this assertion later. 
In algebraic form the above statement 
of the First Condition would be given 
as follows: 


1. A>B, or A< B, or A=B (and 
only one of these is true) 
.IfA>B,BpA 
3. If A> B,and B >C, then A >C 
. If A= B, then B = A 
. If A= B,and B=C, then A=C 
. For all A, there is a B such that 
A= B. 


Campbell’s Second Condition of Meas- 
urement states that (6, p. 14) “. . . the 
systems to be measured must be capable 
of a certain kind of combination, which 
will be termed addition, suitable for this 
purpose.” Such a type of combination 
must exhibit the following properties, 
termed the “laws of addition”: 
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A+B=B+A4 

A+B>A (A+0Oand B+0) 

If A=C and B=D, then A+B 
=C+D 

10. (A+B) +C=A+(B+C) 


ws 
8. 
9. 


The Third Condition of Measurement 
given by Campbell is that it must be 
possible to form a standard series of 
magnitudes such that for any given 
magnitude there will be a member of 
the standard series, or combination of 
members, to which it is equal. This 
standard series, according to Campbell, 
must be formed by taking a particular 
magnitude as the initial standard to be 
assigned the numeral 1, for example. 
Another magnitude can be found equal 
to it and added to the first standard to 
obtain a magnitude of 2. By successive 
addition a portion of the standard se- 
ries can be built up in integral multiples 
of the initial standard. Partial stand- 
ard series must be formed and fitted 
into the standard series to take care of 
magnitudes falling between the elements 
of the first series of integral values. 

The foregoing constitutes a_ brief 
sketch of the traditional requirements 
for fundamental measurement as formu- 
lated by perhaps the most often quoted 
authority in the field. This is the pat- 
tern which has been carried over from 
the field of physics by some writers as a 
criterion by which measurement in psy- 
chology should be judged. There seems 
little doubt that magnitudes which 
properlv fit the pattern described con- 
stitute examples of the most complete 
type of measurement. Such properties 
as length, mass, resistance, and so on, 
are amenable to such treatment. The 
question becomes one of determining 
just how completely such criteria must 
be met under varying circumstances of 
measurement and whether there are al- 
ternative methods of developing meas- 
urement ‘scales. 
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PuRPOSE OF MEASUREMENT 
REQUIREMENTS 


Mathematical systems, particularly of 
number, constitute formal systems with 
definite structural properties. If the re- 
lations and operations in such mathe- 
matical theories are to be used legiti- 
mately to describe conditions in some 
empirical context, it is necessary to 
demonstrate that the empirical mate- 
rials are amenable to the kind of form 
or structure inherent in the mathemati- 
cal system employed. The purpose of 
a set of measurement requirements can 
be described best as one of determining 
whether a particular variable possesses 
a quantitative structure similar in form 
to that of the system of rational num- 
bers.'' The first step is to define a class 
of magnitudes with respect to some 
property, e.g., lengths of objects. Sec- 
ondly, semantic rules (7) must be set 
up whereby certain essential mathemati- 
cal relations and connectives can be 
given physical significance. Thus, op- 
erational definitions must be given for 
the relations “greater,” “less,” “equals,” 
and for the connective “addition.” Hav- 
ing stated what is being measured, what 
certain important relations and opera- 
tions mean in the physical context, it 
remains to be demonstrated that the 
relations and operations defined have 
certain essential characteristics for the 
given class of events. These essential 
characteristics are what Campbell at- 
tempts to describe in his set of meas- 
urement requirements. 

For example, the relation “greater,” 
and its converse “less,” must be transi- 
tive and asymmetrical. The relation 
“equals” must be transitive and sym- 
metrical. The physical operation of 
addition must have the characteristics 
stated in the laws of addition. Having 
defined these various relations and op- 

‘Irrational numbers are obtained through 


calculation with measurements but not in the 
direct process of measurement itself. 
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erations, experiments may be performed 
with a sufficient sample of the class of 
events considered to determine whether 
the essential characteristics outlined do 
apply. When and if this can be shown 
by experiment, a considerable degree of 
structural similarity has been demon- 
strated between the physical system of 
magnitude on the one hand and the ab- 
stract mathematical system of rational 
numbers on the other hand. This is the 
case because the requirements for meas- 
urement have traditionally involved a 
partial statement of the axioms of quan- 
tity from which the ordinary thc srems 
of arithmetic can be deduced. 

It should be pointed out that Camp- 
bell’s requirements for measurement 
seem to reflect a slightly different point 
of view than that just stated. It ap- 


pears that Campbell has included in his 
requirements statements that are con- 
cerned with the physical possibility of 
assigning definite numbers by a given 


process. His insistence that in the class 
of magnitudes there must be at least one 
element equal to every element and his 
Third Condition of Measurement seem 
to be more involved with the physical 
possibility of building up a standard 
series for measurement by comparison 
than in the task of demonstrating a 
similarity of structure between the 
physical and the mathematical. It 
would seem advisable to separate those 
requirements which deal with demon- 
strating the legitimacy of applying a 
mathematical system and those require- 
ments which deal with the practical 
problems of carrying out such an ap- 
plication. 

Having satisfied Campbell’s require- 
ments, however, it is a simple matter to 
assign numbers to represent various 
quantitative degrees of the particular 
property being considered. This as- 
signment takes place through the de- 
velopment of a standard series by means 
of successive addition as _ described 
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briefly under Campbell’s Third Condi- 
tion of Measurement. It is the use of 
this method which demands the ques- 
tioned requirements given by Campbell. 
It is the opinion of the present writer 
that these two requirements might be 
better separated from the others since 
it is conceivable that a method of as- 
signing numbers might be developed 
which would not involve the same pro- 
cedure but which would be equally 
rigorous. Whether this will ever be 
done is another matter. 

Having defined a class of magnitudes, 
established operational definitions of 
crucial relations and operations, and 
demonstrated the suitability of such de- 
fined physical relations and operations, 
it becomes possible to apply the calculus 
of arithmetic to numbers representing 
physical magnitudes. Numbers assigned 
to such magnitudes can be added, di- 
vided, subtracted, and multiplied and 
in general manipulated with every ex- 
pectation that such manipulation will 
have some significance with respect to 
the physical system thus described. In 
addition, certain interesting relations 
among numbers have their correspond- 
ing relations among the physical magni- 
tudes they represent. Equal differences 
and ratios of numbers represent equal 
amounts and ratios of magnitudes in 
the physical domain. In general the 
arithmetical relations so familiar to us 
can be expected to hold also for the 
system of magnitude. 


A Discussion OF CERTAIN 
CRITICISMS 


It is apparent that few, if any, psy- 
chological properties are amenable to 
the type of treatment by which funda- 
mental measurement is developed. In 
particular, there seems to be no way of 
developing a satisfactory operation of 
addition, upon which Campbell’s Second 
and Third Conditions of Measurement 
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depend. This is not a failing peculiar 
to psychology, of course, for even in 
physics only a few variables are funda- 
mentally measurable. It has been tra- 
ditional to form a dichotomy of meas- 
urable properties upon the basis of their 
degree of conformity to the measure- 
ment requirements given by Campbell. 
Those quantities which are capable of 
satisfying the First Condition of Meas- 
urement, i.e., the requirements for or- 
der, have been designated as “intensive” 
quantities. The ones which satisfy the 
laws of addition as well have been 
called “extensive” quantities or magni- 
tudes. Measurement by means of nu- 
merical laws is considered as somewhat 
apart from these, but since the laws of 
addition need not be satisfied for a 
variable measured in such a manner, 
this type of measurement is considered 
more nearly intensive than extensive. 
The traditional theory of measure- 
ment has not allowed for any inter- 
mediate degree between intensive and 
extensive measurement and, as a conse- 
quence, those properties for which a 
suitable operation of addition cannot be 
found have been relegated to the level 
of intensive or rank-order measurement. 
Some psychologists (1, 10, 15) have not 
been satisfied with this dichotomy, but 
few, if any, have given a very satisfac- 
tory analysis of the entire problem 
which would justify a different way of 
thinking about measurement. Psycholo- 
gists frequently do not abide strictly by 
the traditional analysis of the theory of 
measurement, for, in spite of the fail- 
ure of addition for psychological prop- 
erties, measurement scales have been 
used for which characteristics beyond 
the rank-order level have been claimed. 
Psychologists talk of equal-unit scales 
and ratio scales without any reference 
to an additive operation at all. Stevens 
(15) proposed a series of scales ranging 
from nominal through ordinal, and in- 
terval scales up to a ratio scale without 
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an additive operation being introduced. 
Stevens rejects the division of measure- 
ment scales into intensive and extensive 
types and minimizes the importance of 
addition. He does not advance suffi- 
cient support for his position, however, 
in the opinion of the present author. 

Other writers (8, 9, 12, 14) have 
criticized psychological measurement on 
these very points. They imply that 
measurement without a satisfactory op- 
eration of addition cannot extend be- 
yond the rank-order level. Johnson 
(12) contends that psychologists must 
satisfy the requirements for addition if 
they are to leave the rank-order level 
for more complete forms of measure- 
ment. Smith is quite specific on this 
point. He states: 


It may be that some of the qualities 
dealt with in educational measurement will 
turn out to be capable of expression in 
really equal units. If this occurs, no one 
can predict what the particular procedure 
of obtaining them will be. But it can be 
asserted that whatever it is, it will be ex- 
perimental and will publicly demonstrate 
that such qualities have a structure con- 
sonant to that of the axioms of addition. 
For in the process of satisfying the condi- 
tions of addition, equal units are derived 
(14, pp. 141-142). 


There seems to be considerable dis- 
agreement, then, as to whether psy- 
chologists can develop measurement pro- 
cedures which exceed the level of rank 
order without satisfying the require- 
ments for addition. The problem is 
that of determining whether the com- 
plete requirements for fundamental 
measurement constitute only a sufficient 
condition for developing extra-ordinal 
properties in measurement scales or 
whether these requirements are meces- 
sary and sufficient. 

This is perhaps the basic point in 
question between critics and supporters 
of psychological measurement as it is 
today. Other problems have been 





AN APPROACH TO PROBLEMS IN PSYCHOLOGICAL MEASUREMENT 


raised, but they are frequently more 
easily resolved or less bothersome. Di- 
rectly related to this issue is the ques- 
tion of whether the statistical pro- 
cedures being applied to psychological 
data are theoretically sound. Boring 
states: 


All those statistical constants, that im- 
ply a scale of equivalent units, violate in 
use the conditions of the case and lead to 
a precision of result that is an artifact. 
The serial constants, that do not presup- 
pose a unit, yield less intricate resultants, 
but they present a rougher picture that rep- 
resents truly the rough material which they 
describe (2, p. 33). 


Boring thus objects to the use of such 
statistical constants as means and stand- 
ard deviations as well as other statisti- 
cal procedures which attach significance 
to the size of differences between ad- 
jacent scale values in measurement un- 
less a unit has been established. This 
conclusion seems reasonable when the 
formulas for the mean and standard 
deviation are examined. The mean as 
a summation of raw scores divided by 
the number of scores will naturally be 
affected by the absolute size of the nu- 
merical values added. If a scale is em- 
ployed in which equally spaced scores 
do not represent anything like equal in- 
teryals in terms of some experimental 
set of operations, little significance can 
be attached with confidence to the exact 
mean value. The same considerations 
apply to the standard deviation, since 
it also is obtained by manipulating score 
intervals. Stevens (15) has also pointed 
out that some sort of unit must be de- 
fined before means and standard devia- 
tions may be calculated legitimately. It 
goes without saying that those more 
complex statistical procedures which are 
based upon means and standard devia- 
tions depend upon the validity of the 
latter for their support. 

The critics demand of psychology 
that it develop an experimental opera- 
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tion for addition before any claims are 
made that its measurement scales ex- 
ceed the ordinal level. On the other 
hand it has been pointed out that the 
use of such statistics as the mean and 
standard deviation demand an equal 
unit of measurement in some sense. 
Since psychology doesn’t have an addi- 
tive method of treating its data and 
since statistical treatment of the type 
mentioned is widely used, it seems rea- 
sonable to ask if there is not some 
method of approaching this entire prob- 
lem other than through Campbell’s re- 
quirements. Certainly some other ap- 
proach is needed if psychological pro- 
cedures are to be placed on a sound 
theoretical foundation. 

The kind of orientation toward meas- 
urement implicit in the approach of 
Campbell and his followers is one which 
reifies the pure mathematical system as 
the perfect form, so to speak, toward 
which we must strive. No thought is 
given to the possible usefulness of sys- 
tems of numerical treatment which de- 
viate from this conventional pattern. 
The tendency among social scientists 
has been to devise new and unconven- 
tional techniques to handle their prob- 
lems of measurement. Critics of psy- 
chological measurement have been dis- 
turbed by this departure from tradition 
and have been apt to attribute it to 
ignorance. Much of the misunderstand- 
ing has developed because of a con- 
fusion with respect to the ultimate cri- 
teria by which the application of mathe- 
matics to physical phenomena is to be 
judged. 

Almost without exception critics of 
psychological measurement have, in ef- 
fect, adopted a set of logical and opera- 
tional criteria as the ultimate standard 
by which a procedure of measurement 
is to be judged. Many of these critics 
have no doubt forgotten that such cri- 
teria are not ultimate, even in physics. 
The true criterion by which a scientific 
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procedure is to be judged is produc- 
tivity. A theory, method, or system 
will remain or be discarded depending 
upon its contribution to the description, 
explanation, or control of the world 
about us. The application of mathe- 
matical theories to empirical data has 
been helpful in achieving this goal, 
and particularly, arithmetical statements 
have been useful in formulating nu- 
merical relationships between different 
variables. In order for these theories 
to be applied, they demand that a cer- 
tain logical structure be met by the 
physical system to which they are fitted. 
In physics it has been the case that 
systems which satisfy the logical re- 
quirements of known formal systems in 
their empirical interpretations have been 
fruitful in application. 

Where it is manifestly impossible to 
proceed by this route, it is only natural 
to retreat to the original standards in 
evaluating techniques. Thus, in the so- 
cial sciences, the methods which have 
been devised cannot be judged by the 
criteria applicable in physics because 
the problems are different from those 
of physics and the solutions have also 
been of a different nature. If numerical 
methods of description can be applied 
which aid in describing and predicting 
human behavior, then it is absurd to ob- 
ject to their use on the basis of a fail- 
ure to satisfy a set of conditions de- 
signed for a different context. In 
evaluating methods of measurement in 
psychology, and in devising new ones, 
the practical purposes which those meth- 
ods are to serve must be considered. A 
blind struggle to satisfy a set of logical 
criteria for their own sake may be un- 
successful and even success might prove 
disappointing. For this reason let those 
who would improve psychological meas- 
urement be aware of what is involved 
in fundamental measurement and its re- 
quirements but not be misled into mak- 
ing it a fetish. 
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AN OPERATIONAL INTERPRETATION 


It is clear that the traditional ap- 
proach to measurement has crystallized 
the meanings of numbers assigned to 
magnitudes into two moulds. The first 
of these is that of rank-order measure- 
ment, in which the relations expressed 
by numbers assigned are essentially 
those of a simple ordered series (11). 
Here the numbers assigned merely rep- 
resent rank order in a series with re- 
spect to a particular quantity. No in- 
terpretation can be placed upon ratios, 
intervals, or other such concepts. This 
is one of the fixed patterns of meaning. 
The other is that of fundamental meas- 
urement. Here addition is established, 
and from it the whole pattern of rela- 
tions characterized by the propositions 
of arithmetic becomes applicable to the 
physical field. Merely through develop- 
ing a process of addition a consider- 
able increment to the richness of num- 
ber meanings in measurement can be 
achieved. 

Convenient as these two fixed pat- 
terns may be in some contexts, they are 
not too helpful in psychology, for the 
first is not extensive enough for our pur- 
poses and the second appears out of 
reach. Furthermore there is no reason 
why psychologists should be limited to 
two patterns of meaning given numbers 
in measurement when there is practi- 
cally no limit to the number of possible 
patterns. The position to be advanced 
here is that numerical expression may 
be considered as a language which 
varies theoretically between the two ex- 
tremes of conveying no meanings asso- 
ciated with the mathematical properties 
at all to the instance where it conveys 
all those meanings. The first case is 
closely approximated by the application 
of numbers as mere identification tags, 
such as numbers given football players. 
The closest approximation to the other 
extreme is given by fundamental meas- 
urement. Actually, there is no situation 
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of an empirical nature in which all the 
properties of numbers apply completely. 
In the system of rational numbers, for 
example, there is always a rational num- 
ber between any two other rationals, 
however close together they may be. 
Such a proposition could never be veri- 
fied for any physical system. Further- 
more in applications of numbers to con- 
tinuous variables there must always be 
an arbitrary selection of some quantity 
to be assigned a particular number. 
Fundamental measurement is the best 
of our types of measurement, but it is 
not perfect. 

The meanings properly attributable 
to numbers assigned in measurement 
rest upon the set of operations which 
have been carried out in the process of 
measurement. When a statement of 


empirical fact is expressed in mathe- 
matical form, the meaning carried or 
implied cannot transcend the specific 
operations which have been performed. 


Bridgman has made this point in his 
discussion of concepts in physics. He 
States: 


To find the length of an object, we have 
to perform certain physical operations. The 
concept of length is therefore fixed when 
the operations by which length is measured 
are fixed: that is, the concept of length in- 
volves as much as and nothing more than 
the set of operations by which length is de- 
termined. In general, we mean by any 
concept nothing more than a set of opera- 
tions; the concept is synonymous with the 
corresponding set of operations (3). 


Bridgman further points out that even 
the concept of length itself differs from 
context to context because the set of 
operations for its determination vary. 
Length, when considered in stationary 
systems, is not the same as length in 
systems with velocity approaching the 
speed of light; nor is it the same in the 
realm of the very small as it is in the 
infinitely large. 
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When numbers are applied to magni- 
tudes of some variable, then they simply 
indicate that certain operations have 
been carried out and they imply just as 
much, and only as much, as went into 
those operations. When operations are 
devised which meet the criteria laid 
down by Campbell’s requirements for 
fundamental measurement, the numbers 
assigned have a specific and rather ex- 
tensive meaning; when the operations 
performed are just sufficient to insure 
a transitive asymmetrical order of the 
quantities to which the numbers are as- 
signed, another meaning is applicable; 
numerical laws provide still other mean- 
ings for numbers in measurement; and 
finally it can be stated that there are as 
many meanings to be given numbers as- 
signed in measurement as there are sets 
of operations by which such assignment 
is accomplished. 

It is quite clear that there is no 
a priori reason why psychologists must 
confine themselves to one level of nu- 
merical description, or two, assuming 
that fundamental measurement should 
by some unknown means become avail- 
able in certain instances. Contrary to 
the implication of many critics, there is 
no valid dichotomy which can be estab- 
lished among measurement procedures 
upon the basis of addition. This di- 
chotomous classification of measurement 
procedures has become thoroughly in- 
trenched because the purposes of meas- 
urement have frequently been met in the 
past through the application of a cal- 
culus of arithmetic to measurements. 
In order to apply such a calculus, the 
operations of addition, division, sub- 
traction, and multiplication, plus cer- 
tain relations, needed definition. This 
has been done through the process of 
addition. 

There seems to be no other procedure 
available at present which can insure 
the applicability of a calculus of arith- 
metic to measurements except that of 





224 


fundamental measurement with its proc- 
ess of addition. It is to be emphasized, 
therefore, that any intermediate level of 
numerical description between rank-or- 
der and fundamental measurement that 
may be possible in psychology will not 
allow the application of the calculus of 
arithmetic. To be sure, arithmetic may 
be applied as though the measurements 
obtained were of the fundamental kind, 
but the interpretation of the results 
must be considered uncertain. 

Granted that psychological measure- 
ment does not have suitable additive 
operations and hence cannot apply the 
calculus of arithmetic for the manipula- 
tions of its measurements, as can be 
done with measurements of length and 
mass, what can be done with psycho- 
logical measurements? Certainly limita- 
tions are placed upon the development 
of laws, such as the numerical laws of 
physics relating fundamentally measur- 
able variables. Psychology is a long 
way from this level in most areas of its 
endeavor, however, so this limitation is 
not apt to prove too confining for some 
time. Most of the work in psychology 
involves the application of statistical 
methods to data for the purpose of 
establishing relationships and drawing 
conclusions. To what extent can such 
statistical treatment take place without 
the advantages of additive scales? 

Statistical procedures for the most 
part do demand something more than 
rank-order scales. The mean and stand- 
ard deviation, for example, require an 
established unit in some sense, as do all 
those procedures which are based upon 
these statistical constants. In spite of 
statements to the contrary by some 
critics, it will be asserted that addition 
is not necessary in measurement scales 
designed for use with most of the com- 
mon statistical methods. As was pointed 
out previously, the interpretation of 
means and standard deviations must 
depend upon the type of unit employed, 
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but no addition in the experimental 
sense seems required. In answer to the 
possible objection based upon the fact 
that scores are added in finding a mean, 
it should be pointed out that no quan- 
tity interpretation is placed upon the 
sum of scores. The whole procedure is 
merely a means of establishing a defined 
level for a group of scores. This de- 
fined level will be fixed and meaningful 
providing a unit of measurement has 
been established. 

The equal-unit property of a meas- 
urement scale, then, is certainly one of 
great importance for many of the pro- 
cedures which can be applied to psycho- 
logical data. It is certain, of course, 
that such a property is obtained along 
with many others when fundamental 
measurement is established through a 
process of addition. This is not the 
only way an equal-unit property can be 
built into a measurement scale, however. 
If it were, psychology would be forced 
to retreat to rank-order types of treat- 
ment if it wished to use procedures with 
a sound basis in theory. Operational 
procedures may be designed with pre- 
cisely the function of lending experi- 
mental significance to interpretations of 
differences between measurements. Fur- 
thermore, such operations may be de- 
veloped without any reference to a ba- 
sic experimental process for addition. 

The line of approach suggested here 
constitutes a direct attack, so to speak, 
upon certain important properties which 
are desirable in measurement scales. 
Ordinarily such properties might be ob- 
tained indirectly through developing an 
additive operation from which other 
properties follow. But failing the pos- 
sibility of this approach, there is no rea- 
son why these desirable properties them- 
selves may not be approached directly 
by means of an operational attack. 
This is not to say that the results will 
be the same as those obtained by de- 
veloping a process of addition. Cer- 
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tainly in many cases the very property 
in question will be achieved with only 
partial success. For example, a pro- 
cedure might be developed in which it is 
possible to relate only adjacent intervals 
on a scale without comparing remote in- 
tervals. Here a part of the equal-unit 
property may be achieved, but it is far 
from perfect. However, some progress 
beyond rank-order measurement is defi- 
nitely achieved. 

Where individual sets of operations 
are used for developing a rather isolated 
property in some measurement scale, 
the confidence placed in the results of 
numerical treatment is not as great as 
that possible with the more complete 
measurement procedures, but a step has 
been taken in the right direction. The 
problem of deriving equal units can be 
attacked by many different kinds of op- 
erations. By working with these differ- 
ent methods over a period of time, the 
more satisfactory precedures will gradu- 


ally become known. Similarly other im- 
portant properties may be developed in 


measurement scales. Among these may 
be included ratio properties, and what is 
significant, a number of properties which 
are not usually associated with numbers 
themselves. An example of such a use 
of numerical description will be pre- 
sented in the next section. 

The point to be emphasized here is 
that the possibility exists for obtaining 
levels of numerical description inter- 
mediate to rank-order and fundamental 
measurement. This fact itself might 
have little significance except that such 
intermediate levels of description have 
a definite usefulness since many of the 
purposes of psychological measurement 
can be served by techniques short of 
fundamental measurement but which 
could not be met by rank-order tech- 
niques. In the next section a few ex- 
amples will be outlined briefly to illus- 
trate the principles which have been 
presented. 
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EXAMPLES 


Stevens and Volkmann (16) used the 
method of equal-sense-distances to de- 
velop a pitch, scale with at least equal- 
unit properties. Standard frequencies 
were set at 200 c/s and 6500 c/s to be 
played to the subject when he pressed 
a key for each of them. Three other 
keys were provided for frequencies 
which could be adjusted by the subject. 
His task was to vary the frequencies of 
the three intermediate keys until the in- 
tervals between the five frequencies were 
psychologically equal with respect to 
pitch. The instructions emphasized the 
necessity of comparing each interval 
with each other interval in both ascend- 
ing and descending order. The initial 
bisections of the large interval fre- 
quently needed adjustment after the re- 
sulting intervals were themselves bi- 
sected. The entire experiment was re- 
peated using intervals from 40 c/s to 
1000 c/s and 3000 c's to 12,000 c/s. 
For each of the experiments, a graph 
was constructed plotting pitch against 
frequency. Intervals on the pitch axis 
were equated for the average frequency 
settings between the standards. It 
proved possible to combine all these 
data into one pitch function throughout 
the range used. 

The equal units achieved by experi- 
mental operations of this kind might 
best be designated as “psychologically 
equal units” to forestall any objection 
that such units cannot be considered as 
identical with those obtained through 
more fundamental types of measure- 
ment. There is no necessity for con- 
sidering them to be the same as those 
achieved in fundamental measurement. 
A definite set of operations has been 
employed to lend meaning to the unit 
established on the scale of measurement. 
This is sufficient to warrant the asser- 
tion that a unit has been established 
and that means may be calculated. The 
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interpretation placed upon this and 
other statistical constants must always 
be in terms of the specific operations 
performed. There is little question that 
operational procedures may vary in 
their degree of accuracy in establishing 
such units. This is not a matter of 
theoretical concern, however, for error 
is a relative matter. Even in our most 
exact measurements there remains a 
margin of error. The usefulness of a 
technique will be limited if the degree 
of error is so large as to render results 
little better than chance, but from a 
theoretical point of view, it is more im- 
portant that empirical procedures are 
employed which allow an operational in- 
terpretation of concepts employed in de- 
scription. Where operational procedures 
are developed giving meaning to the 
claim for equal units in a measuring 
scale, then certainly the rank-order level 
of measurement has been significantly 
exceeded. 

In certain cases the uses of numerical 
description in psychological measure- 
ment may be such that unconventional 
meanings are attached to certain rela- 
tions among the numbers assigned. The 
centile scale is an example. Suppose 
scores on a test for a particular group 
are converted into centile values so that 
every individual in the group has a 
number assigned to represent his level 
of achievement. Individuals with higher 
numbers performed better on the test 
than those with lower numbers; in gen- 
eral the rank-order characteristics are 
probably applicable. Further, differ- 
ences between numbers assigned have 
a definite significance. A difference be- 
tween centile scores of 50 and 90 indi- 
cates that 40 per cent of the individuals 
taking the test are included in that 
range. Such an interpretation of a dif- 
ference between numbers is very differ- 
ent from an interpretation of differences 
in arithmetic. Some would object to 
such usage because such interpretations 
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have no place in true measurement. In 
answer to such criticisms it can be said 
that the assignment of numbers has 
much the same nature as a nominal defi- 
nition, i.e., a definition in which it is 
stated that a word will be used to mean 
a certain thing. When numbers are as- 
signed, it is agreed that they will con- 
vey the meanings which are implied by 
the operations performed. No difficulty 
is encountered if the word, or the num- 
bers, are not thereafter given different 
meanings than were originally invested 
in them. In the example of the centile 
scores no difficulty is encountered until 
someone attempts to give centile scores 
a meaning which is properly associated 
only with numbers assigned by a funda- 
mental process. For example, if the in- 
terpretation is made that a centile score 
of 50 represents twice as much ability 
as a centile score of 25, a gross error 
has been made, for there is nothing in 
the operations performed by which such 
a statement could be justified. 

Ratio properties of scales have also 
received some attention by psychologists 
in recent years. Although such prop- 
erties are not so extensively demanded 
by statistical operations as the equal- 
unit property, they afford a possible ap- 
proach to a type of quantitative treat- 
ment obtained by more rigorous meth- 
ods in physics. An example of a method 
developed by the writer will be pre- 
sented.” A method of judgment devel- 
oped by Metfessel (13) was modified 
and combined with a presentation of 
stimuli by pairs. Nine lines of varying 
lengths were drawn on 12” X 12” pieces 
of cardboard and presented to 47 sub- 
jects in all possible combinations of two. 
The subjects divided 100 points between 
the members of each pair in accordance 
with the apparent relative lengths of 
the two lines. The points assigned to 
each line in its comparisons with each 
other line were averaged over subjects 


2 Submitted for publication. 
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separately for each pair of lines. A 
matrix of average judgments was thus 
obtained in which each entry gave an 
average number of points assigned one 
line when it was compared with some 
other particular line. These average 
numbers of points from 100 were con- 
verted to ratios of points by dividing 
the given average number of points by 
the average of the points given the line 
with which it was compared. 

The stimuli were arranged in rank 
order on the basis of the total number 
of points received in all comparisons. 
Then ratios of the largest stimulus to 
the next larger, and so on, were com- 
puted. The number of such ratios 


which could be obtained for each pair of 
adjacent stimuli in the rank-order se- 
ries was equal to N — 1. One ratio was 
available directly since the two stimuli 
under consideration were compared di- 
rectly. Other values for this ratio were 
implied by the interrelations of these 


stimuli with others. Let A and B repre- 
sent the first two stimuli in the rank- 
order series and C a third. Then, the 
ratio of A to C divided by the ratio of 
B to C gives a computed value for the 
ratio of A to B. Averages of such ratios 
were computed. These were compared 
with the ratios obtained from actual 
measurements. In no case did any ratio 
from scaling deviate from the ratio by 
measurement more than .08. The re- 
sults of this little experiment demon- 
strate that a type of scaling available in 
psychological research can, in certain 
respects, approximate the accuracy of 
more exact types of measurement. In 
this particular case, ratio properties 
were obtained with considerable ac- 
curacy, and without the benefit of an 
operation of addition upon which such 
properties are traditionally supposed 
to rest. 
SUMMARY 


The main points of this paper may 
be summarized as follows: 
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1. Traditional requirements for com- 
plete measurement demand a transitive 
asymmetrical order with respect to the 
relation “greater” and a transitive sym- 
metrical relation of equality. Further, 
certain laws of addition must be satis- 
fied. 

2. Critics of psychological measure- 
ment state that no operation of addition 
is available for psychological traits and, 
since equal units are based upon addi- 
tion in fundamental measurement, that 
psychological traits are measurable only 
on the rank-order level. 

3. It has been pointed out that many 
of the statistical procedures used by 
psychologists in treating their data de- 
mand at least an equal-unit type of 
measurement. 

4. As a consequence of these state- 
ments one might conclude that the ap- 
plication of any but ordinal statistics 
can never be justified until a method of 
addition is developed for psychological 
traits. 

5. In the traditional approach, meas- 
urement patterns have been crystallized 
into two categories, “intensive” and “ex- 
tensive.”’ It is pointed out in this paper 
that the meanings given numbers in 
measurement can be considered as vary- 
ing with the kind of operations em- 
ployed in measurement. There is no 
a priori reason for limiting the mean- 
ings given numbers in measurement to 
either the ordinal or fundamental pat- 
tern. 

6. By devising suitable operations, 
psychologists can develop measurement 
procedures which will allow the legiti- 
mate application of statistical methods 
without a process of addition. Equal- 
unit and ratio scales are two examples 
of scales which can be obtained by pro- 
cedures available in psychological re- 
search. The equal units obtained by 
direct experimental operations without 
a process of addition should be charac- 
terized as ‘psychologically equal units” 
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to distinguish them from those devel- 
oped in fundamental measurement. 

7. Methods of measurement in psy- 
chology intermediate to rank-order and 
fundamental measurement must not be 
considered sufficiently rigorous to allow 
the application of the calculus of arith- 
metic but they are suitable for many 
purposes and definitely exceed the mere 
rank-order level of measurement. 
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A NOTE ON McGINNIES’ “EMOTIONALITY AND 
PERCEPTUAL DEFENSE” 


BY DAVIS H. HOWES AND RICHARD L. SOLOMON 


Harvard University 


Some interesting experimental meas- 
urements of visual duration thresholds 
and galvanic skin responses (GSR) have 
been recently reported in this Journal 
by McGinnies (4). Two results com- 
prise the essence of his experiments: 
(1) Taboo words require longer tachis- 
toscopic exposures before being reported 
correctly than do words that are with- 
out any apparent emotional connota- 
tion; (2) GSR after exposures which 
precede correct report is greater when 
the exposed (but mot yet reported) 
word is taboo than when it is neutral. 
These observations have been inter- 


preted to indicate perceptual defense, 
which McGinnies described as a “per- 
ceptual ‘filtering’ of visual stimuli | that] 


serves, in many instances, to protect the 
observer as long as possible from an 
awareness of objects which have un- 
pleasant emotional significance for him” 
(4, p. 244). “This poses a problem for 
neurophysiological explanation,” he con- 
tinues (p. 249 f.), “which cannot be an- 
swered here. Is the galvanic skin re- 
sponse preceding recognition of critical 
(i.e., taboo) words a result of ‘feed- 
back’ from the cortical association cen- 
ters? Or is autonomic response initi- 
ated as the visual impulses reach the 
optic thalamus?” McGinnies is by no 
means alone in his interpretations: 
Statements of similar portent could have 
been quoted from any of a number of 
recent papers (1). The purpose of the 
present note is to show that appeal to 
“perceptual defense” is unnecessary to 
account for McGinnies’ results. 
Thresholds. Extensive data of our 
own (2) show that duration thresholds 
for words (measured by an ascending 


method of limits) vary radically as a 
function of the logarithms of their rela- 
tive frequencies of usage, determined by 
the Thorndike-Lorge word counts (6). 
Typical of our results is Fig. 1, which 
shows the mean duration thresholds for 
sixty different words plotted against 
their log word-frequencies as found in 
the Lorge Magazine and Lorge-Thorn- 
dike Semantic Counts.' 

The implications for perceptual theory 
of this function between threshold and 
response probability are developed in 
another paper (5). Our contention here 
is that McGinnies’ taboo words might 
be expected to have far higher duration 
thresholds than his neutral words be- 
cause the relative frequencies of the 
former are far lower. His thresholds 
(taken from 4, Fig. 2) are plotted in 
Fig. 2 as a function of word-frequencies 
defined just as in Fig. 1. Circles indi- 
cate his neutral words, triangles his 
critical or taboo words. The slope 


1 Use of the Thorndike-Lorge tables requires 
arbitrary rules for assessing the frequencies of 
inflected forms such as plurals and participles. 
To a plural formed by adding s or es we as- 
signed the same frequency as its singular form; 
a participle was assigned a frequency one-half 
the frequency of its root. Only 6 of the 60 
words whose thresholds are shown in Fig. 1 
were plurals, and only one was a participle. 
When log frequencies are plotted, as in Fig. 1, 
such factors are of little importance. 

If words not appearing in the Thorndike- 
Lorge tables were assigned a log frequency of 
zero, their log frequency would be negative in- 
finity. It was, therefore, assumed that such 
words would occur, on the average, once in a 
sample 10 times as large as the Thorndike- 
Lorge samples, giving it a log frequency of 
—1.0. Again the exact frequency assigned to 
such words is of little significance where log 
frequencies are used. 
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Fic. 1. The relationship between mean 
duration threshold in seconds and log word- 
frequency determined by the Thorndike-Lorge 
word counts. Duration thresholds for sixty 
words are shown. The Pearson correlation 
coefficient is — 0.79. 


drawn through his data actually was 
fitted to the neutral-word thresholds 


alone,’ according to a principle derived 
from our own data (the argument is too 
involved to report here, but is presented 


in detail in another paper [2]). The 
taboo words behave pretty much as if 
their Lorge-Thorndike frequencies pre- 
dicted their duration thresholds ade- 
quately. 

A general form of our “frequency 
criticism” was suggested to McGinnies 
by Bruner (4, p. 250), who had seen 
our data. But McGinnies, who of 
course could not have known that mean 
duration threshold actually trebles over 
a range of four log units of word-fre- 
quency, rejected the use of Lorge- 
Thorndike frequencies to predict his 


2? The particular slope, of course, is of no 
consequence to the present argument. It is 
only necessary to show that McGinnies’ data 
exhibit the same general relationship between 
duration threshold and log word-frequency 
that holds for the extensive data illustrated by 
Fig. 1. The slope of Fig. 2 was drawn simply 
to indicate the accuracy with which a fre- 
quency formulation predicts the threshold 
data. 


thresholds on the grounds that “despite 
their infrequent occurrence in print . . . 
the critical words are quite common in 
conversational usage” (4, p. 250). Now 
only if the conversational frequencies 
of his critical words are approximately 
equal to those of his neutral words can 
this reasoning justify his disregard of 
the effect of word-frequency on dura- 
tion thresholds. In short, his paper as- 
serts, by implication, that our conversa- 
tions are as often adorned by raped, 
whore, penis, and bitch as by child, 
clear, dance, and music! Horrified, we 
insist that Professor McGinnies speak 
for himself. Common morality, even if 
plain observation were to fail, constrains 
us to believe that his neutral words 
better characterize the conversations of 
at least some of his collegiate subjects. 
We certainly can assure him that our 
own conversations are spiced only very 
rarely indeed by such delicacies of ex- 
pression! 

Those who are bold enough to insist 
on his extraordinary assumption must 
still face the criticism that no facts on 
conversational word-frequencies can be 
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Fic. 2. Mean duration thresholds in seconds 
as related to log word-frequency for the words 
used in McGinnies’ experiment. The points 
represented by the triangles are for McGin- 
nies’ taboo words and the circles represent 
his neutral words. 
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marshalled to its support. On the other 
hand, no matter how fervently one may 
believe the Lorge-Thorndike samples to 
have been idiosyncratic, the facts of 
Fig. 1 (and other functions like it) con- 
vert such arguments to a carping at de- 
tail. For as the fact stands now, word- 
frequencies as defined by the Magazine 
and Semantic Counts, not as we guess 
them to be in conversation, have actu- 
ally predicted the duration thresholds 
of college students both in Cambridge, 
Massachusetts, and in Tuscaloosa, Ala- 
bama. 

Galvanic skin response. What about 
the greater GSR’s recorded for pre- 
threshold exposures of taboo words? 
Bruner suggested that this phenomenon 
might result from the greater “effort” 
required to see more difficult (ze., 
higher-threshold) words (4, p. 250). 
This suggestion has promptly been put 
to experimental test (7), with prelimi- 
nary results indicating little or no cor- 
relation between “effort” and GSR. 
But quite apart from Bruner’s sugges- 
tion we reject as a confusion of terms 
McGinnies’ interpretation of his GSR 
results as “discrimination without aware- 
ness.” How do you know a subject is 
aware of a word unless he discriminates? 
And if he discriminates, how do you 
know he is not aware of it? We are 
happier with a translation to “one kind 
of discrimination before another kind,” 
or, more specifically, “GSR discrimina- 
tion before verbal-report discrimination” 
—a view recently adopted by Bruner 
and Postman (1). 

That GSR discrimination does in fact 
occur to briefer flashes than are re- 
quired for correct verbal response is 
further evidenced by a recent experi- 
ment by McCleary and Lazarus (3). 
Nonsense syllables paired during train- 
ing with electric shock, they found, gave 
a significant GSR on pre-threshold ex- 
posures, while the responses to syllables 
that had no history of shock treatment 
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were insignificant. It would seem, then, 
that almost any written stimulus to 
which GSR has been strongly condi- 
tioned has a tendency to produce a 
GSR before the word will be reported 
verbally—which is only another way of 
saying that GSR discrimination occurs 
with briefer stimulation than does ver- 
bal-report discrimination. 

Two factors operate in experiments 
like McGinnies’, either of which would 
tend to produce the observed result that 
GSR occurs to exposures of taboo words 
that are too brief to bring out the cor- 
rect verbal report: (1) While the dura- 
tion threshold is by definition an all-or- 
none phenomenon, both GSR and the 
duration of exposure can vary continu- 
ously; (2) the “set” of the subject 
tends to make him inhibit, or “‘with- 
hold,” the act of speaking a taboo word.* 
Both factors are capable of experimental 
development. Unfortunately no data 
are provided in McGinnies’ paper on 
the duration of those pre-threshold ex- 
posures to which marked GSRs occur 
for his critical words, for, on the basis 
of either factor, we would expect that 
only a chance proportion of the strong 
GSRs occurred to exposures much 
shorter than the eventually recorded 
thresholds.* 

Visual exposure of a word seems not 
to have an all-or-none effect, but ap- 
pears to increase, in proportion to the 

3 These considerations indicate that the crit- 
ical words would have slightly higher thresh- 
olds than neutral words, independently of dif- 
ferences in word-frequency. “Withholding,” 
therefore, would supplement the effect of the 
differences between the Thorndike-Lorge fre- 
quencies for the two groups of words. The 
importance of “withholding” relative to differ- 
ences in word-frequency, however, can be de- 
termined only by further experimentation. 

4 Perhaps some confirmation of this hypoth- 
esis is implicit in McGinnies’ statement that 
“Almost without exception, the galvanic skin 
response of the observers was greater follow- 
ing the final exposure of the critical words; 


that is, the one during which recognition oc- 
curred” (4, p. 250). 
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duration of the exposure, the p. obability 
that the exposed word will be reported 
(2). But the verbal report by which 
word-thresholds are conventionally de- 
fined is all-or-none: only one word, the 
most probable at the moment of report, 
is admitted as the response to each ex- 
posure. When exposure is brief, this is 
likely to be some word other than the 
one exposed, for the probability of the 
exposed word will have been increased 
only slightly, often to a level lower than 
that to which chance fluctuations will 
have raised the probability of some 
other word. Eventually, as duration of 
exposure is lengthened, the probability 
of the exposed word will be increased to 
a value higher than any chance fluctua- 
tion can attain. At this point the ex- 
posed word becomes the word of high- 
est probability and hence is reported, 
fulfilling the definition of a duration 
threshold. 

But while verbal report depends only 
on the strongest of the array of word- 
probabilities that follows an exposure, 
no such restriction has to apply to 
GSR. Let us assume, for simplicity’s 
sake, that after a particular exposure 
of pre-threshold duration there are four 
words whose probabilities are very high, 
much higher than those of any other 
word. If amy of these four words has 
been conditioned in the subject’s life 
history to elicit GSR, a large condi- 
tioned GSR may occur, though only one 
of the words can be reported. Moreover 
the fact that exposing a word increases 
its probability will mean that after the 
relatively long exposures that are just 
short of threshold duration, the exposed 
word will be, on the average, among the 
words of highest probability. If the ex- 
posed word is also one that has been 
conditioned to elicit a strong GSR (Mc- 
Ginnies’ taboo words may have such 
histories), it follows that GSR will tend 
to occur following pre-threshold ex- 
posures of that word. 
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The subject’s “set” in the usual ta- 
chistoscopic experiment tends to “in- 
hibit” his actually speaking a taboo 
word. McGinnies’ study appears to be 
no exception. Suppose a long exposure 
has raised the probability of the taboo 
word far above the probabilities of any 
other word, so that it would be reported 
if the inhibitory “set” were not present; 
overt report of the word will tend to be 
held back in the atmosphere of scien- 
tific respectability that surrounds the 
experiment. While such a word is with- 
held it will not be recorded for the 
threshold, but its probability is high 
enough to elicit a strong conditioned 
GSR. McGinnies reported that no sub- 
ject admitted to have “withheld” a 
critical word, except for the first one 
that he came across. But since the 
same “set” that would lead to “with- 
holding” might also inhibit admission 
of withholding, not much faith can be 
placed in such reports. “Deliberate 
withholding,” moreover, is only one de- 
scription a subject might give of his 
“set” to inhibit taboo words. In the 
end only experimental manipulation of 
the “set” can serve as evidence on the 
present issue. It should be possible to 
reduce the inhibitory effect of the “set,” 
for example, by removing members of 
the opposite sex from the experimental 
room, using friends for subjects, en- 
couraging informality, etc. 

What does all this mean to the sub- 
ject in the experiment? Perhaps, if we 
were able to get a candid description 
from a rather percipient male subject, 
it would run something like this: “Sup- 
pose you are a young college student, 
prudish despite your disclaimers thereto, 
and respectful to faculty members de- 
spite your comments behind their col- 
lective backs. Your idea of a psycho- 
logical experiment probably is that it is 
designed to explain you in the least 
flattering of all possible terms, and no 
assurances to the contrary are likely to 
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convince you otherwise. Your smatter- 
ing of Freud puts you on guard, for you 
expect that any out-of-the-ordinary re- 
mark you should happen to drop—espe- 
cially if it adumbrates a taboo—will be 
pounced upon immediately and misin- 
terpreted to mean all sorts of shocking 
things. 

“Half curious and half suspicious, you 
enter a strange room marked by per- 
plexing apparatus. Laboratory atmos- 
phere grips you as it does almost every- 
one else who is unfamiliar with it. 
Moreover, not only your professor, but 
a young lady assistant is present to ob- 
serve you (4, p. 246). The instruc- 
tions, backed by the awesome prestige 
of Science, are to report everything you 
see in a flash of light that you observe 
through an eyepiece. 

“You do your best for Science on the 
first word. But with flashes so short 
all you can see is a blur. Among the 


many words that look like the blur for 


several flashes in a row is river. Finally, 
you report river, though the exposure is 
still much too short for you to be per- 
fectly certain; and it proves to be cor- 
rect. Prepared to labor even more dili- 
gently on the next word, you concen- 
trate on the flashes. Nothing is visible 
at first, but soon you begin to make 
words out of the blur. Now one that 
first occurred to you several flashes ago 
seems as certain as river did when you 
guessed it correctly, so you report— 
“But, Heavens, NO! The word is— 
penis! And there is that girl (not to 
mention your professor) hanging on ev- 
ery word! Suppose it really isn’t penis, 
after all (one can’t be sure about tenth- 
of-a-second flashes)—what wouldn’t 
they think about you if, out of the clear 
blue sky, you should volunteer that 
word! Moved to conservatism by those 
considerations, you may wait until you 
are more certain before reporting it. A 
few more flashes, gradually becoming 
longer, sink all hope that the word is 
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really not penis. Perhaps you even hold 
off a little longer, but more and longer 
exposures force you to face the inexor- 
able fact that the appearance of penis 
will continue until you go ahead and 
say the word.” 

Few investigators would be surprised 
to find a marked deflection in GSR at 
the moment the possibility first occurred 
that penis was the exposure word; and 
even fewer would be surprised to find 
one when the conflict was aroused be- 
tween the tendencies to report penis 
and the tendency to disobey instruc- 
tions for the sake of respectability. Re- 
porting the neutral words—none of 
which anyone would mind saying before 
the King of England—traises no con- 
flict; nor does it seem likely that GSR 
had been conditioned strongly to any 
of them. 

All this discussion must in the end 
come down to a summary of McGin- 
nies’ actual experiment and of its re- 
sults. Tossing overboard our layman’s 
language along with his perceptual fort- 
resses, we can summarize them as 
follows: (1) The duration at which 
verbal discrimination appears is of the 
same order for taboo and for neutral 
words, when the effects of Thorndike- 
Lorge frequencies are extracted. (2) 
Taboo words elicit strong GSRs, which 
occur to exposures that are too short to 
bring forth the correct verbal response, 
just as did shock-associated syllables in 
another experiment. (3) Only the most 
probable word could be reported after 
each exposure, but GSR could occur to 
any word of high probability that had 
been conditioned previously to elicit 
GSR. (4) McGinnies’ experimental 
situation would tend to “set” his sub- 
jects to inhibit overt report of those 
words eliciting strong GSRs. Further 
experimentation is necessary to show 
how strongly this fourth factor affected 
thresholds over and above the effects 
of word frequencies. 
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DISCUSSION OF HOWES’ AND SOLOMON’S NOTE ON 
“EMOTIONALITY AND PERCEPTUAL DEFENSE” 


BY ELLIOTT McGINNIES 
University of Alabama 


Apparently disturbed by use of the 
concept “perceptual defense” to de- 
scribe the elevated recognition thresh- 
olds of observers confronted with emo- 
tionally-toned words, and dubious also 
of the notion of “pre-recognition emo- 
tionality” (3), Howes and Solomon (1) 
have contrived an interesting theory of 
their own to explain these results. At 
first reading their argument appears 
formidable, and the term “perceptual 
defense” does indeed seem to wither in 
the face of several impressive scatter- 
plots and their respective regression 
functions. Closer scrutiny of their dis- 
cussion, however, reveals that they have 
supported some very tenuous and in- 
consistent generalizations in order to 
buttress their theory. 

In effect Howes and Solomon state 
(1) that if we take into account the 
low frequency listings in the Thorndike- 
Lorge tables of emotionally-toned words, 
these words exhibit duration thresholds 
of the same order as for neutral words, 
(2) that the equation showing regres- 
sion of duration thresholds on log fre-+ 
quency of occurrence for neutral words 
adequately predicts the relatively higher 
thresholds actually shown by taboo 
words, and (3) that the observers were 
probably unwilling to verbalize these 
words even after recognizing them, the 
implication again being that the thresh- 
olds for such words are actually close 
to those for neutral words, and that the 
pre-recognition galvanic skin responses 
of the observers were, in reality, indica- 
tions of conflict between tendencies to 
suppress the word and to follow the ex- 
perimenters’ instructions. It is difficult 
to see how these authors can support 


the above three positions simultane- 
ously, since several mutually exclusive 
possibilities are present. Taboo words 
either have a higher duration threshold 
than neutral words or they do not. If, 
as our critics maintain at one point, the 
observers have deceived us and actually 
have perceived the critical words with 
nearly as much celerity as they did the 
neutral words, then the frequency hy- 
pothesis, if valid, has generated an er- 
roneous prediction. On the other hand, 
if the frequency hypothesis is sound, the 
statement that it “pretty well” predicts 
the thresholds of the taboo words bears 
closer examination. 

Thresholds. Considering in some 
greater detail the specific objections 
made by Howes and Solomon to the 
concepts of perceptual defense and pre- 
recognition emotionality, we may first 
examine their contention that the dura- 
tion thresholds of taboo as well as of 
neutral words vary as a function of their 
relative frequencies of usage. In sup- 
port of this generalization they have 
presented a scatter-plot based on the 
Thorndike-Lorge semantic counts (7) 
and the empirically derived recognition 
thresholds of words appearing in this 
list (1, Fig. 1). The correlation be- 
tween duration thresholds and the loga- 
rithms of the word frequencies reaches 
the significant value of — 0.79. With 
this particular function there can be no 
quarrel. The word percipience is cer- 
tainly less common than automobile and 
might be expected to present greater 
difficulty of recognition under conditions 
of rapid exposure. As a matter of fact, 
percipience occurs so rarely in print that 
it is not even listed in the Thorndike- 
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Lorge tables. Consequently Howes and 
Solomon have arbitrarily assigned it a 
log probability of — 1.00 on the as- 
sumption that it would occur at least 
once in a sample of words ten times as 
long as the Thorndike-Lorge lists. They 
have treated uncoerced (also unlisted) 
in similar fashion. Now it would seem 
quite evident that written language and 
spoken language are not the same. The 
appearance in the bookstalls of fictional 
works designed to present the verbal be- 
havior of individuals as it actually oc- 
curs (cf., The Naked and the Dead, 
Mister Roberts, or any of Erskine Cald- 
well’s or James Farrell’s widely read 
novels) is generally an occasion for 
much flurry among the critics and re- 
viewers, not to mention those censorious 
individuals who are both enraged and 
titillated by the public display of our 
more picturesque speech habits. The 
writer is indeed gratified to learn that 
the conversations of Messrs. Howes and 
Solomon have remained uncorrupted by 
those hearty Anglo-Saxon expressions 
that are the common property of small 
boys, grown men, and the walls of pub- 
lic places. Nevertheless, it seems hardly 
fair to rank whore, bitch, belly and 
kotex along with such linguistic oddities 
as beatific, etcher, elegies, and vignettes 
with respect to frequency of usage. The 
writer has yet to be confronted in drug 
stores, grocery stores and department 
stores with stacks of cartons all promi- 
nently labeled KOHLRABI. (The 
word Kohlrabi, incidentally, occurred 
six times in eighteen million words 
according to Thorndike-Lorge.) Yet 
Howes and Solomon blithly assign penis 
and kotex (not listed by T-L) log fre- 
quencies of — 1.00 along with percipi- 
ence and uncoerced, and thereby achieve 
two anchor points for their extrapolated 
curve. 

Even if we grant some validity to this 
wholly arbitrary and presumptive pro- 
cedure of treating taboo words (which, 
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by definition, are not encountered fre- 
quently in literary sources) in the same 
fashion as neutral words, the assump- 
tion that the data of our report can 
be subsumed under the same set of 
principles governing the regression of 
Howes’ and Solomon’s words upon log 
frequency is open to serious doubt. 
Since a major portion of their argument 
rests upon the contention that the dura- 
tion thresholds of the taboo words are 
actually predicted by their log fre- 
quencies of occurrence, a more precise 
evaluation of this assumed relationship 
is indicated. First, it should be noted 
that the regression line drawn by these 
authors through our neutral words was 
not derived by standard techniques. 
They took a log frequency value of 5.00 
as the origin and then proceeded to 
draw a line through the means of the 
two distributions, extrapolating this 
function through the plot of critical 


words. Since their justification for such 
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Fic. 1. Scatter-plot adapted from Howes 
and Solomon (1) showing the relationship of 
critical and neutral words to log frequency 
and mean duration threshold. The regression 
equations for the several functions indicated 
are as follows: critical words, Y = — 0.003X + 
0.119; neutral words, Y = 0.001X + 0.061; all 
words, Y= —0.018X + 0.116. Neutral words 
are represented by circles. Critical, or taboo, 
words are shown by triangles. 
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a procedure is not elaborated, we may 
forebear mention of the several criti- 
cisms that might be leveled against this 
method. In the figure, we have shown 
the scatter-plot based upon our origi- 
nal data. Three regression lines have 
been drawn, all based upon actual ob- 
servations. Two of these indicate the 
regression of threshold upon log fre- 
quency for the critical and neutral 
words separately. The third shows the 
regression function for the entire sam- 
ple of words. 

Several conclusions are evident upon 
examination of these functions. The 
relationship between duration thresh- 
olds and log frequencies predicted by 
the Howes-Solomon hypothesis is shown 
neither by the scatter-plot for the neu- 
tral words nor by that for the critical 
words. In both cases a regression line 
of approximately zero slope defines the 
relationship between threshold and fre- 
quency. Interestingly enough, if one 
combines the scatter-plots for the neu- 
tral and taboo words, the computed re- 
gression function closely resembles that 
suggested by Howes and Solomon. This 
fact, however, does not constitute other 
than circumstantial evidence for the as- 
sumption that the higher thresholds of 
the taboo words were determined by 
their alleged infrequency of occurrence. 
If, as we have contended, the elevated 
duration thresholds for these words were 
a function of their affective connota- 
tions, the net regression efféct would be 
the same. Furthermore, if a strong 
tendency toward higher thresholds with 
decreasing log frequency were actually 
present, one would expect the regression 
lines computed separately for the neu- 
tral and taboo words to reveal some- 
thing of this trend. The fact that no 
such relationships exist indicates that 
the over-all regression effect could be 
merely an artifact resulting from the 
combining of two sets of threshold data 
the respective means of which have been 
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determined by word meaning rather 
than by frequency of occurrence in con- 
servative literary sources. It must be 
admitted, of course, that a possible con- 
founding effect introduced by frequency 
cannot entirely be discounted without 
further investigation. But no such ef- 
fect has been demonstrated with our 
data by the technique employed by 
Howes and Solomon. 

Pre-recognition emotionality. In as- 
sessing the data relating “emotionality” 
of the exposed words to pre-recognition 
measurements of galvanic skin response, 
Howes and Solomon entrench them- 
selves behind the conviction that the ob- 
servers invariably recognized the taboo 
words before they were willing to ver- 
balize them. It must be admitted that 
this possibility does, in fact, constitute 
one of the knottier problems in this type 
of research. Consequently, a distinction 
between “discrimination” and “aware- 
ness” seems not only justified but neces- 
sary. Discrimination is inferred from 
differential patterns of effector activity 
in response to varying stimulus config- 
urations. The observer may or may not 
be aware or “conscious” of the fact that 
he has achieved such a discrimination. 
Evidence from other sources indicates 
that observers are capable of differential 
response to stimuli without, at the same 
time, being able to verbalize the basis 
for their discriminatory reactions (5, p. 
153). Awareness, then, becomes simply 
the capacity to verbalize a discrimina- 
tion which, if necessary, could be com- 
municated through other channels. Any 
discrimination of the “verbal report” 
variety, of course, involves that integra- 
tive phase referred to as the observer’s 
“jmmutably private experience.” This 
aspect of behavior, by definition, is not 
susceptible to direct public observation. 
Arguments concerning the admissibility 
of private experience to the psychologi- 
cal laboratory have been summarized by 
Pratt (6) and need not be discussed 
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here. But the postulation of a central 
integrative process that can mediate 
discrimination without necessarily bring- 
ing about awareness (in the sense of ac- 
curate perception) is neither tautologous 
nor unreasonable. 

Reference to the fact that elevated 
GSR’s occurred prior to verbal report 
of emotionally-toned words as “dis- 
crimination without awareness” depends 
on the assumption that the observers 
did not suppress prompt verbalization 
of their discriminatory experiences. In 
order to obtain data which might help 
elucidate the processes underlying the 
apparent imperception of observers to 
socially taboo symbols, we recorded all 
pre-recognition hypotheses volunteered 
by the observers. These, as discussed 
in the original paper, were classified ac- 
cording to whether they were structur- 
ally similar to or unlike the stimulus 
word, monsense, or part responses. 
Analysis of the frequency with which 


pre-recognition responses fell into each 
of these categories revealed that the ob- 
servers made significantly more similar 
and part hypotheses when viewing the 
neutral words, and significantly more 
unlike and nonsense hypotheses when 


confronted with the critical (taboo) 
words. If one interprets the patterning 
of pre-recognition guesses from the point 
of view of their “adaptiveness” to the 
observers, it may reasonably be sup- 
posed that the formulation of hypothe- 
ses which are structurally unlike the 
stimulus word represents a kind of 
avoidance behavior. The eliciting of 
nonsense hypotheses by the emotion- 
ally-toned words would also seem to 
indicate avoidance of meaning. Both 
of these tactics, together with the 
threshold findings, can quite economi- 
cally be subsumed under the rubric of 
perceptual defense. It seems improb- 
able as Howes and Solomon would have 
to maintain, that the observers spon- 
taneously and consistently adopted these 
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patterns of hypothesis formation when 
faced with taboo words and reverted to 
a different pattern of verbal-report be- 
havior when viewing the neutral stimuli. 
In their general criticism of the concept 
of perceptual defense, they fail even to 
attempt application of their “suppres- 
sion” theory to the content analysis of 
the data. 

Not unmindful of the possibility that 
the observers in such an experiment 
might occasionally withhold verbaliza- 
tion of emotionally-toned words, despite 
their disavowal of such an intent, the 
writer and his colleagues conducted 
some additional research to control pre- 
cisely this factor (4). This is not yet 
published, and so Howes and Solomon 
were not aware of the findings when 
preparing their review. Since the pres- 
ent issue cannot be exploited without 
reference to these additional data, we 
may summarize them briefly: The dura- 
tion thresholds of 20 observers were ob- 
tained for eight neutral words, selected 
on the basis of their having approxi- 
mately the same frequencies of occur- 
rence as indicated by the Thorndike- 
Lorge tables. All words consisted of 
five letters. Before each exposure in 
the ascending series of duration inter- 
vals used to obtain the threshold for a 
given word, the observer was required 
to view at full-recognition exposure 
either another neutral word, or a taboo 
word, such as bitch or whore. Four of 
the eight neutral words for which 
thresholds were obtained were always 
preceded by an emotionally-toned word; 
the other four always followed a neu- 
tral word. The mean_ recognition 
thresholds of the four words which had 
before each exposure been introduced 
by a taboo word, however, were signifi- 
cantly higher than the duration limens 
of the four words which had always 
been preceded by exposure of a neutral 
word. It should be emphasized that the 
observers were not required to verbal- 
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ize the emotionally-laden words; they 
merely had to look at them. All of the 
thresholds were obtained for neutral 
words, balanced with respect to length 
and frequency. 

While these results lend themselves 
to several interpretations (discussed in 
the complete article), it is clear that 
perception of a socially taboo word 
interferes with the subsequent recog- 
nition of a neutrally-toned word. Ob- 
viously word meaning over and above 
such structural characteristics of the 
word as length, syllable composition, or 
frequency of occurrence must be con- 
sidered in predicting the recognition be- 
havior of individuals. The writer will 
be among the first to welcome a mathe- 
matical or quasi-mechanical statement 
of such perceptual processes as have re- 
cently been demonstrated with socially- 
relevant stimuli. But the actual or 
symbolic adaptiveness of such percep- 
tual behavior in a given individual may 


involve processes not readily predictable 
in terms of general probabilities derived 


from the population at large. Such 
terms as defense and vigilance, with re- 
spect to perceptual functions, may yet 
prove to have more than heuristic value 
to the investigator. 

In view of the demonstrated fact that 
temporal association between a critical 
and a neutral word will raise the recog- 
nition threshold of the latter, the ob- 
tained differences in duration thresholds 
between neutral and taboo words. in 
McGinnies’ experiment cannot sum- 
marily be discounted on the assumption 
that the observers withheld saying the 
critical words until several exposures 
after correct recognition occurred. The 
highly significant value of Fisher’s ‘ 
derived from this data furnishes assur- 
ance that a significant difference would 
exist even had suppression of verbal re- 
port occurred in several instances. Cor- 
roborative evidence by McCleary and 
Lazarus (2) lends support to our con- 
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clusion that the elevated GSR’s of our 
observers represented actual pre-recog- 
nition responses to the toned words. 
The suggestion of a “feed-back mecha- 
nism” whereby central integrative, and 
perhaps even sensory, processes are 
modified in the direction of a reenforc- 
ing perceptual state deserves further at- 
tention. Parsimony, contrary to the 
contentions of Howes and Solomon, is 
not served by attempts to force all per- 
ceptual data into the framework dic- 
tated by rigid adherence to one particu- 
lar theory. While the high correlation 
found between log frequency and dura- 
tion thresholds may hold implications 
for understanding the recognition be- 
havior of individuals confronted with 
affectively-neutral stimuli, the mere ex- 
trapolation of this relationship and its 
uncritical application in the case of per- 
ceptual response to emotionally toned 
words is highly questionable. 

Returning to a lighter vein, the writer 
can only conclude that Howes and Solo- 
mon, in attempting a blanket applica- 
tion of their frequency theory, have 
marketed a questionable product. Sev- 
eral adulterations are apparent: (1) 
They have concocted a visually sugges- 
tive scatter-plot from our data, gar- 
nished it with an insidious and highly 
convincing regression line, but failed to 
indicate that this slope is achieved 
through the wedding of two distribu- 
tions whose respective functions do not 
support the hypothesis in question. (2) 
Bemused apparently by a pious belief 
that the communicative symbols em- 
ployed verbally by undergraduates re- 
flect only such literary influences as 
Black Beauty, Little Women, and the 
Ladies’ Home Journal, they have in- 
voked “common morality” and have 
condemned to negative logarithms those 
very expressions that never fail us in 
time of adjectival or appelative neces- 
sity. (3) They have conveniently over- 
looked (perceptual defense perhaps?) or 
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ignored the implications for an adaptive 
theory of perception contained in our 
analysis of pre-recognition hypotheses 
and have passed off too lightly the rele- 
vant supporting findings of McCleary 
and Lazarus. (4) Accomplishing from 
Cambridge, Massachusetts what we 
were unable to do at close range, they 
have penetrated the initial privacy of 
our observers and have discovered 
therein a foul plot to deceive the experi- 
menters. Additional experimental data 
obtained in our laboratories indicates 
that the effects of any such unconscion- 
able malingering would detract only 
slightly from the observed effect. The 
concepts of “perceptual defense” and 
“pre-recognition emotionality” may 
eventually fall, but not before an on- 
slaught of such uncertain proportions as 
this. 
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MULTIPLE OPERATION MEASUREMENT 


BY ROBERT GLASER! 
University of Kentucky 


INTRODUCTION 


The lack of a systematic orientation 
in the field of psychological testing has 
been pointed out by a number of 
writers. For example, an address by 
Cureton to the Psychometric Society 
in 1946 contained the following: 


The Psychometric Society was 
founded eleven years ago today. A 
great deal has been accomplished during 
these years in developing and applying 
quantitative methods. On the other 
hand, at least in the areas where the 
chief working tool is the psychological 
test, very little has been done toward the 
development of a rational science. In 
consequence, many otherwise excellent 
mathematical studies have taken their 
starts from assumptions which do not 
correspond with the actualities of test 
structures and experimental controls. 
It is time to reverse this trend, and to 
emphasize and develop the rational 
foundations of mental measurement (2, 
p. 191). 


The primary concepts of psycho- 
logical testing, reliability, validity, 
item analysis, and scaling, exist as 
almost separate entities which appear 
to have “‘just growed.’”’ The estab- 
lishment of a rigorous and interrelated 
framework for the construction and 
evaluation of psychological tests re- 
quires the systematic development of 
basic concepts in the field. Loevinger 
(8) has written recently that these 
problems “. . . are first problems in 
the sense that their solution is pre- 
supposed by the most powerful instru- 


1 This paper was part of a Ph.D. thesis done 
under the direction of Dr. Douglas G. Ellson 
at Indiana University. The writer wishes to 
express his appreciation to Professor Ellson for 
his assistance and encouragement throughout 
the entire course of this development. 


ment of test analysis now available, 
namely, factor analysis” (8, p. 3). 

It is the thesis of this paper that a 
systematization of the concepts of 
psychological testing can be brought 
about by taking the position that tests 
are measurements; that they are not 
a unique kind of measurement, but 
are measurements which require the 
same kinds of operations that are re- 
quired by measurements in other sci- 
ences. The adoption of such a posi- 
tion may lead to a clarification of 
present testing concepts or to a new 
set of measurement concepts. 

Measurement in general can be 
defined as the assignment of numbers 
to represent objects or events. The 
properties of the things measured 
determine the characteristics of the 
assigned numbers. Campbell (1, p. 
126) defines physical measurement as 
“ . , the assignment of numerals to 
represent properties of material sys- 
tems other than number, in virtue of 
the laws governing these properties.”’ 
Stevens (12, p. 677) writes that, ‘‘The 
type of scale achieved depends upon 
the character of the basic empirical 
operations performed. These opera- 
tions are ordinarily limited by the 
nature of the thing being scaled and 
by our choice of procedures. . . .” 

The characteristics of behavioral 
measurement, like all scientific meas- 
urements, are determined by the prop- 
erties of the subject-matter involved. 
In measuring behavior, then, the laws 
of behavior, or response properties in 
certain stimulus situations, determine 
the characteristics and applications 
of the scales of measurement which 
result from the performance of certain 
measuring operations. 
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It is the purpose of this study to 
analyze a particular kind of measure- 
ment operation employed in psycho- 
logical testing, to present a theoretical 
model for the stimulus-response prop- 
erties of this situation and to indicate 
the possible applications of this model. 


MULTIPLE OPERATION 
MEASUREMENT 


The method of measurement under 
consideration here has been called 
multiple operation measurement or 
the graded criteria method by Ellson 
(3, p. 8), who has described it as fol- 
lows: 


We can measure the diameter of a ball 
in a single operation using a caliper or 
micrometer or we can measure it by at- 
tempting to pass it through a series of 
holes of differing diameters. This sec- 
ond method requires at least two opera- 
tions, one to determine the smallest hole 
through which the ball will pass and one 
to determine the largest through which 
it will not pass. Usually many more 
than two operations are necessary to 
obtain a single measurement. Essenti- 
ally, the method consists of a series of 
criteria more or less evenly spaced along 
acontinuum. To obtain a measurement 
the object must pass one or more criteria 
at one end of a scale and fail one or more 
at the other end. A measurement is ob- 
tained at the place on the scale where the 
object no longer passes but begins to 
fail. If the object passes or fails all the 
available criteria no measurement is 
obtained. 

Loevinger has called psychological 
tests which conform to the method of 
multiple operation measurement, 
cumulative tests. She has described 
them as follows: 


In a perfectly homogeneous cumula- 
tive test when the items are arranged in 
order of decreasing popularity, each 
person from some defined population 
will score plus up to an item characteriz- 
ing him and minus on all subsequent 
items. 
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In the case of tests of ability, clearly 
if two items measure the same ability, 
then the ability to do the harder pre- 
supposes the ability to do the easier 
item. When the items are arranged 
according to difficulty, everyone will 
succeed up to a certain item, the one 
characterizing his level of ability, and 
fail all subsequent items, provided we 
have succeeded in our aim of construct- 
ing a perfectly homogeneous test. 

In order to generalize the idea of 
cumulative tests, substitute for difficulty 
the complementary concept, popularity. 
The popularity of the item will be the 
proportion of the group scoring plus on 
that item. 

Obviously, the appropriate way to 
score such a test is to count the number 
of pluses. (9, p. 508.) 


Guttman has also based his work 
on multiple operation measurement 
and has described it in somewhat 
different terms. 


Consider a mathematics test com- 


posed of the following problems: 


(a) If r is the radius of a circle, then 
what is its area? 

(b) What are the values of x satisfying 
the equation ax* + bx + c¢ = 0? 

(c) What is de*/dx? 


If this test were given to a population 
of members of the American Sociological 
Society, we would perhaps find it to form 
a scale for that population. The re- 
sponses to each of these questions might 
be reported as a dichotomy, right or 
wrong. There are 2 X 2 X 2 = 8 possi- 
ble types for three dichotomies. Actu- 
ally, for this population of sociologists 
we would probably find only four of the 
possible types occurring. There would 
be the type which would get all three 
questions right, the type which would 
get the first and second questions right, 
the type which would get only the first 
question right, and the type which would 
get none of the questions right. Let us 
assume that this is what would actually 
happen. That is, we shall assume the 
other four types, such as the type getting 
the first and the third question right but 
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the second question wrong, would not 
occur. In such a case, it is possible to 
assign to the population a set of numeri- 
cal values like, 3, 2, 1,0... Each member 
of the population will have one of these 
values assigned to him. This numerical 
value will be called the person’s score. 
From a person’s score we would then 
know precisely to which problems he 
knows the answers and to which he does 
not know the answer. Thus a score of 
2 does not mean simply that the person 
got two questions right, but that he got 
two particular questions right, namely, 
the first and second. A _ person’s be- 
havior on the problems is reproducible 
from his score. (7, p. 143.) 


Multiple operation measurement is 
a basic procedure in psychophysics. 
The determination of an absolute 
threshold by the method of constant 
stimuli involves the presentation of a 
single stimulus to a subject who is 
instructed to respond to it in one way 
(plus) or another (minus). Each 
stimulus is presented a large number 
of times and on successive trials 
stimuli of varying magnitude are pre- 
sented. In this situation it is found 
that when the stimuli are ordered in 
terms of magnitude along the scale of 
measurement, there is not an abrupt 
change from plus responses to minus 
responses, but rather that there is 
“error of measurement” due to sub- 
ject variability or lack of precision 
of the test which is indicated by a 
gradual transition from plus to minus 
responses. This transition, when 
either plus or minus responses are 
plotted as a function of stimulus 
magnitude in many instances approxi- 
mates the form of the integral of the 
normal probability curve. A meas- 
urement, the threshold, is taken as an 
arbitrary point along this curve. 

Many psychological tests can be 
considered as instances of multiple 
operation measurement. The em- 
ployment of this kind of measurement 
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operation in testing is analogous to 
its use in psychophysics. A score on 
such tests is obtained by the presenta- 
tion of a series of items presumably 
spaced along a scale on which the test 
is designed to measure. A _ subject 
responds to each item by a plus or 
minus response, é.g., right or wrong, 
normal or deviant, favorable or un- 
favorable, yes or no, etc. Each item 
is usually presented once and the 
ordinal scale position of each item is 
determined by the proportion of the 
test group responding plus on that 
item. The number of items having 
different proportions of plus responses, 
i.e., the number of items at each scale 
position, usually varies. For any 
subject a measurement, the score, is 
determined by counting the number of 
plus responses. 

Multiple operation measurement, 
then, appears to be a widely employed 
method of measurement and upon it 
have been based the contributions of 
Loevinger and Guttman. Let us 
now turn to an analysis of the response 
properties of the stimulus situation 
presented when this method of meas- 
urement is employed in psychological 
testing. 


THE RESPONSE PROPERTIES OF 
MULTIPLE OPERATION 
MEASUREMENT IN 
PSYCHOLOGICAL 
TESTING 


Since certain psychophysical meth- 
ods and psychological testing methods 
are examples of multiple operation 
measurement, it can be assumed that 
the response properties of the two 
situations are similar because as men- 
tioned above, both involve similar 
stimulus situations. The correspond- 
ence between psychophysics and test- 
ing has been pointed out by such 
writers as Guilford (6) and Mosier 
(10,11). Thus, when test items are 
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ordered in terms of their scale posi- 
tion along a scale of measurement, it 
should be found, as in the psycho- 
physical situation, that there is a 
gradual transition from plus to minus 
responses. If the score point (which 
can be considered as a measurement 
equivalent to a threshold), is marked 
on the scale according to the ordinal 
position of the items, e.g., a score of 
65 would be marked at the 65th item 
on the scale, this transition zone 


should be distributed about it. 

One way of looking at the responses 
to test items in this transition zone is 
to consider them as inconsistent re- 
sponses, 1.e., on successive administra- 
tions of a test a subject would respond 
sometimes plus and sometimes minus 
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to the same item, as is the case with 
psychophysical stimuli about the 
threshold. The occurrence of incon- 
sistent responses would be a function 
of the probability, as indicated by the 
psychometric ogive, of a plus or 
minus response at the scale positions 
of the items involved. A distribu- 
tion of inconsistent responses for 
items along the scale in the transition 
area should be the derivative of this 
ogive and can be assumed to approach 
normality under certain conditions. 
Figure 1 shows this distribution. If 
a test score is considered as equivalent 
to a 50 per cent threshold where items 
are responded to 50 per cent of the 
time plus and 50 per cent of the time 
minus then the maximum of the dis- 
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The distributions of minus responses and of inconsistent responses on 
a uni-dimensional test. 
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tribution of inconsistent responses 
should fall at the score point on the 
scale. 

Two hypotheses, then, can be made 
concerning the response character- 
istics of multiple operation measure- 
ment in psychological tests: (1) on a 
uni-dimensional test when the items 
are ordered in terms of their scale 
position there is exhibited a distribu- 
tion of inconsistent responses around 
the score point in the zone of transi- 
tion from plus to minus responses. 
It can be assumed that this distribu- 
tion should approach normality since 
the frequency of occurrence of the 
inconsistent responses is a function of 
the occurrence of plus and minus 
responses along the psychometric 
ogive and should be the derivative of 
this ogive. Specifically, this re- 
sponse model can be assumed to be 
the case for a test employing the 
method of multiple operation meas- 
urement, on which the items require 
either a plus or minus response (e.g., 
right or wrong) with the probability 
of getting a plus response by guessing 
equal to zero. The variability of this 
distribution is a function of the dis- 
criminatory power of the test, similar 
to the psychophysical measures of 
precision. (2) A test score may be 
considered as an estimate of the 
maximum point of the distribution of 
inconsistent responses and should cor- 
respond to the modal or mean point 
of this distribution. 


THE RELATIONSHIP BETWEEN 
INCONSISTENCY OF RESPONSE 
AND TEST SCORE 


The above postulated relationship 
between test score and inconsistency 
of response offers an answer to a 
problem with which a number of in- 
vestigators have been concerned. It 
is the relationship between test score 
and the consistency of response pat- 
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tern on a test. An estimate of the 
number and distribution of inconsist- 
ent responses may be obtained from 
two successive administrations of a 
test by determining the number and 
position along the scale of measure- 
ment of those items on which re- 
sponses change from plus to minus or 
minus to plus. Whether or not a sub- 
ject can exhibit a complete distribu- 
tion of inconsistent responses depends 
upon the position of the individual’s 
score on the scale of measurement and 
the range of scale position of the test 
items for the group taking the test. 
Distributions of inconsistent responses 
at the extremes of the scale of meas- 
urement may be truncated. This has 
been elaborated in a previous study 
by the writer (5) and in a paper by 
Ellson and Glaser (4) and can be 
explained as follows: 

Figure 2 shows the effect of various 
scale ranges upon a subject’s distribu- 
tion of inconsistent responses. In 
Case I in Fig. 2, a test is given to 
subjects B, C, and D whose inconsist- 
ent responses are shown as more or 
less normal distributions on the scale 
of measurement. Assuming equal in- 
consistency for these subjects a zero 
correlation between test score and 
number of inconsistencies would be 
expected. In Case II this same test 
is given to subjects C, D, and E. 
Here subject E, who has a high score 
and who is potentially as inconsistent 
as the others, receives a low inconsist- 
ency count. The shaded area indi- 
cates the inconsistent responses this 
subject could not make because items 
of the necessary level were not in- 
cluded in the test. For a group of 
such subjects as C, D, and E, there 
would be a negative correlation be- 
tween inconsistency and test score. 
Those subjects with a high test score 
would exhibit a low inconsistency 
count. In Case III the test is given 
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to subjects A, B, and C. Here sub- 
ject A, who has a low test score, has 
his inconsistency count restricted be- 
cause the range of the scale position 
of the test items does not extend low 
enough. Consequently in this case 
there would be a positive correlation 
between inconsistency and test score. 
In Case IV the test is given to a wide 
range of subjects like A, B, C, D, and 
E. In this case, subjects with both 
high and low test scores have their 
inconsistency counts restricted. This 
would result in a close-to-zero product- 
moment correlation between incon- 
sistency and test score. However, a 
curvilinear correlation coefficient, such 
as eta, would be expected to be 
greater than zero in this case. 


Hypothetical distributions of inconsistent responses for four groups of subjects given 
the same test. 


To recapitulate, the following hy- 
potheses can be made: 


(1) When the range of test items is 
adequate for the group being tested, 
as it is in Case I, a-zero correlation be- 
tween inconsistency and test score 
can be expected. 

(2) When, as in Case IV, the range 
of a test is inadequate for a group at 
both ends of the scale of measurement, 
a near zero product-moment correla- 
tion and an eta greater than zero be- 
tween inconsistency and test score 
can be expected. 

(3) When the test range is inade- 
quate at the upper end of the scale 
as in Case II, the mean of the group 
tested would be close to the maximum 
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score possible on the test, and a 
negative correlation between test score 
and inconsistency, should be expected. 
(4) When the test range is inade- 
quate at the lower end of the scale, as 
in Case III, where the mean test score 
would be close to the minimum possi- 
ble score on the test, a positive cor- 
relation between test score and in- 
consistency should be obtained. 


An ASSUMED MODEL FOR THE 
RESPONSE PROPERTIES OF 
Two-DIMENSIONAL TESTS 


Up to this point our analysis of the 
response properties of multiple opera- 
tion measurement on _ psychological 
tests has been concerned with essenti- 
ally uni-dimensional tests. Let us 
now consider multi-dimensional tests, 
restricting the analysis to the two- 
dimensional case. Suppose a_ test 
consists of two kinds of items, X items 
and Y items. Each of these kinds of 
items is associated with a particular 
dimension and is distributed more or 
less uniformly over a scale of meas- 
urement. A subject’s score on such a 
test would consist of the number of X 
and Y items to which he responded 
plus. Depending upon the subject’s 
relative levels of performance on each 
of the test dimensions, 7.e., the num- 
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ber of X items to which he responds 
plus compared with the number of Y 
items to which he responds plus, the 
pattern of responses can be assumed 
to have varying properties. 

Figure 3 shows the case where a 
subject or group of subjects score low 
(few plus responses) on the X items, 
and score high (many plus responses) 
on the Y items. In the framework of 
multiple operation measurement, the 
following model of the pattern of re- 
sponses along the scale of measure- 
ment can be assumed for this case: 
There occur two distributions of in- 
consistent responses corresponding to. 
a subject’s performance on each di- 
mension. Between them lies the X Y 
score point which is the subject’s 
score on this two-dimensional test. 
Proceeding from the lower end of the 
scale to the upper end, a subject would 
respond plus on both X and Y items 
up to the beginning of the lower dis- 
bribution of inconsistent responses. 
Here there would be a distribution of 
inconsistent responses on the X items, 
the dimension of the lower level of 
performance. Above this a subject 
would respond minus to the X items 
and continue to respond plus to the 
Y items up to the distribution of in- 
consistent responses on the Y items. 
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There would then occur a distribution 
of inconsistent responses on the Y 
items. Above this a subject would 
respond minus to both X and Y items. 
The response model can be assumed 
to be the same if the relative levels 
of performance on the X and Y di- 
mensions are reversed, a_ subject 
scoring low on the Y items and high 
on the X items. However, while the 
model is the same, the plus and minus 
responses within and between the 
upper and lower inconsistency dis- 
tributions would occur on different 
items for each of the two classes of 
subjects. 

As a subject’s performances on the 
two dimensions fell closer together on 
the scale of measurement the area 
between the two inconsistency dis- 
tributions would become smaller; as 
the performances fell still closer the 
area between the two distributions 
would be obliterated and the X- and 
Y-item inconsistent response distribu- 
tions would overlap. If a subject’s 
levels of performance were equal on 
both dimensions the two distributions 
would coincide. 


THE IDENTIFICATION OF 
Test DIMENSIONS 


The fact that plus or minus re- 
sponses in certain situations may be 
related to items belonging to particu- 
lar dimensions and the fact that a 
subject’s performance on each dimen- 
sion on a two-dimensional test can be 
estimated by the modal or mean point 
of his inconsistent response distribu- 
tions, offers a method for identifying 
the items as belonging to one dimen- 
sion or the other. The development 
of such a method would involve the 
separation of those subjects who 
scored low on the X items and high 
on the Y items from those subjects 
whose score consisted of low perform- 
ance on the Y items and high per- 


RoBERT GLASER 


formance on the X items. Only then 
after separating each group could the 
items be identified as belonging to one 
scale or the other by determining the 
prevalent kind of response, plus or 
minus, for each item. Several meth- 
ods can be employed to accomplish 
this. 

(1) The first method is developed 
as follows?: Let us consider a two- 
dimensional test consisting of JN, 
class X items and N, class Y items. 


N =N,+Ny,N. +N; (1) 


These items are combined into a single 
test Z, the items of which are ordered 
according to their per cent plus re- 
sponses for some population. The 
two classes of items, X and Y, are 
distributed more or less uniformly 
over the range of Z. The items are 
assigned numbers according to their 
ordinal position on the scale of meas- 
urement for Z and in this way a vari- 
able z is defined which takes on the 
values of the numbers assigned to the 
Z class of items. In the same manner 
variables x and y are identified with 
the ordered items of subclasses X and 
Y. On a two-dimensional test each 
Z is either an X or a Y. Hence, to 
each x and y there can be co-ordinated 
another z, or zy. Aside from the 
effect of inconsistent responses, the 
total score (z) of any subject is made 
up of his score on the X items and his 
score on the Y items, so that: 


(2) 


For each individual there is a value 
of x, of y, of 2,, of z,, and of z. The 
value of z is the subject’s test score. 
The values of x and y are unknown 
since the items of subclasses X and Y 
are not identified and separated. How- 
ever, the numbers 2, and z,, which 
correspond to x and y, can be deter- 


2 The writer is indebted to Dr. C. J. Burke 
of Indiana University for this formulation. 


z=x+y. 





MULTIPLE OPERATION MEASUREMENT 


FEQUENCY 
or 


INCONSISTENT 
RESPONSES 





bf 
1 





FREQUENCY 

oF 
INCONSISTENT 
RESPONSES 


! 
j 





T 
SCALE OF MEASUREMENT 
(2 SCALE) 





T 
SCALE OF MEASURE MENT 
(z SCALE) 


: ae : ~~ --? . . 
Fic. 4. Diagrammatic illustration of the quantity en for class “A” and class “B” subjects. 
wu 41 


(The case is shown where there are more zz items on the test than there are z, items.) 


mined for each subject from the point 
of the maximum number of inconsist- 
ent responses. On a two-dimensional 
test there would be two such points, 
noticeably separate for some indivi- 
duals and merged for others, depend- 
ing upon whether the subject’s levels 
of performance are different or nearly 
the same for the two dimensions. 
For some subjects, Class A, 


Zz > By. (3) 
For other subjects, Class B, 
Zz < By. (4) 
For subjects in Class A, 
» <3<%& (5) 
The number of items between: 


z and 2, is (2 — 2,), 
z and 2, is (sz; — 2), (6) 
z, and 2, is (2, — 2,). 
The relative sizes of classes X and 
Y to the class Z are: 
N, ] 


Ny 
NV and WV (7) 


and since X and Y items are assumed 
to be uniformly distributed over the 
test range, the fractions in (7) should 
give the relative numbers of the two 
kinds of items on any portion of the 
scale. 

Since N, and N, are unknown, 
N,z Ny 
W and Ww can be 
constants a, and a,. These con- 
stants are subject to the restriction 
from (1) that a, + a, = 1, so only a 
single constant is needed 


replaced by the 


a=a, = 1 — dy. (8) 
From these assumptions: 
x = az,and y = (1 —a)z. (9) 
And from (2) and (9) 
2 = az, + (1 — a)zy. 


From (10) 


(10) 


oa ¢ 
7 “~y 


m (2, — 2y) 


(11) 


a 


and 
(z pee 22) 


fo. al (12) 


1-a= 
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So far, while the two values of z, 
and z, have been obtained for a given 
subject it is not known whether his 
levels of performance are relatively 
high on the X dimension and low on 
the Y dimension or vice versa, 1.é., 
whether the subject belongs to Class 
A or to Class B. 


For class A, z, < 2; 
For class B, 22 < 2, 


Let us define for each individual, z; as 
the lower of the two points, z, and 
zy, and z, as the upper. Then, 


For class A, 2; = 2, and 2, = 2;. 
For class B, 2; = 2, and 2, = 2y. 


Substituting in (11) and (12) 


For class A: 
q = 2%) 
(Zu “_ 21) 

For class B: 


(13) 


(z — 21) 
(Zu — 21) 
meaning of the 


l1-a= (14) 


The quantity 


= & , ° 
- mes two-dimensional scale can 
be further elaborated by reference to 
Fig. 4. The test score, z, consists of 
the number of z, and z, items up to the 
point z; plus the number of 2, items 
between z; and z, for Class A (zg, < 
z,), or the number of z, items between 
z, and 2, for Class B (2, < 2). The 
number of 2, or g, items constitutes a 
certain proportion, a or 1 — a, of the 
distance between 2; and z,. The 
number of items on the scale (z, and 
zy items combined) which corresponds 
to the number of s, or z, items which 
must be added to z; to obtain the total 
test score z is given by the distance 
between z; and z. If there are more 
Z, items on the test than there are 2, 
items, then for Class A (z, > 2,), 2: 
items must be added to z,; this would 
be a large proportion of the distance 
% — 3: For Class B (2, < 2), 2, 
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items must be added to 2;, this would 
be a smaller proportion of z, — 2z:. 


= ‘ a Be 
The quantity a ea oe these pro- 
u~ #l 


portions. 
—™ 41 


Thus, the quantity can be 
Zu — 21 


computed for each subject. A fre- 
quency distribution of this quantity 
can be made. If a@ and 1 — a differ 
significantly from each other, and 
from 0.5, the distribution will be 
bimodal. It should be noted that the 


, Z—2 , , 
quantity 7 will differ from 0.5 
u~ 41 


only when N, + Ny, as has been as- 
sumed in the above development. 
The method may be effective however 
in the case where NV, = N, when items 
from one dimension are piled up at the 
end of the test range, so that for most 
of the range of the test and for most 
subjects the items may be distributed 
uniformly and as if N, + N,. The 
presence of two modes enables us to 
classify the subjects in classes A and 
B. All subjects whose scores are 
above the higher mode can be as- 
signed to Class A. All subjects below 
the lower mode can be assigned to 
Class B. Once these subjects are 
classified, items can be assigned to 
classes X and Y by an analysis of the 
items between the two inconsistency 
distributions for each group. Those 
items which exhibit large percentages 
of plus or minus responses can be con- 
sidered as belonging to the different 
test dimensions, X and Y. The plus 
responses of Class A and minus re- 
sponses of Class B would belong to the 
same dimension; the minus responses 
of Class A and the plus responses of 
Class B to the other dimension. 

The occurrence of inconsistent re- 
sponses can also be used to identify 
the dimension to which an item be- 
longs. For both classes A and B the 
inconsistent responses of the lower 
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distribution would occur on items be- 
longing to the same dimension as the 
minus responses in the area between 
the inconsistency distributions, and 
the inconsistent responses of the 
higher distribution would occur on 
items belonging to the same dimension 
as the plus responses in this area. 

(IIa). Another method of identify- 
ing the items on particular scales in a 
two-dimensional test consists essenti- 
ally of a pattern analysis of the re- 
sponses in the area between the two 
inconsistency distributions. The ra- 
tionale for this method consists of the 
fact that Class A subjects whose level 
of performance is low on the X dimen- 
sion and high on the Y dimension 
exhibit plus and minus responses on 
different items than do the Class B 
subjects whose level of performance is 
high on the X dimension and low on 
the Y dimension. If subjects are 
selected who have widely disparate 
performances on the two dimensions 
then all those subjects who show agree- 
ment on their pattern of responses 
fall into the same class; the subjects 
whose response patterns disagree fall 
into different classes. The method 
can be carried out as described in the 
following: 


(1) Those subjects are selected 
who indicate separated distributions 
of inconsistent responses. This will 
select those subjects with different 
levels of performance on the two di- 
mensions. 

(2) The responses of each subject 
in the area between the two incon- 
sistency distributions are compared 
with the responses in the same area 
for every other subject in the following 
manner; let us take the case of two 
subjects C and D: 

(a) The number of items (m) in the 
area between the two inconsistency 
distributions is obtained for both C 
and D. 
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(b) This number ” consists of both 
plus and minus responses; the number 
of plus and minus responses can be 
determined for each subject and the 
proportion of each of the total number 
of responses (m) can be obtained. 
These proportions can be written as 
P. and Q, for subject C and for sub- 
ject D as Pa and Qu. 

(c) The proportion of items on 
which plus responses for both subjects 
will occur by chance is given by: 


(P.)(Pa) - Pa. 


The proportion of items on which 
minus responses for both subjects 
will occur by chance is given by: 


(Q.) (Qa) “-? Qea- 


(d) The proportion of items on 
which either a plus or minus response 
for both subjects will occur is given by: 


(Pa) + (Qea) _ P caQed: 
(e) The quantity 
n( P.aQea) om 


yields the number of items on which 
the two subjects would show agree- 
ment as to plus and minus responses 
by chance. 

(f) If the obtained number of agree- 
ments for the two subjects signifi- 
cantly exceeds m then the subjects 
can be considered as belonging to the 
same class; if the obtained number of 
agreements are significantly less than 
m, the two subjects can be placed in 
different classes. By this means two 
groups of subjects each with similar 
relative levels of performance on the 
two test dimensions can be obtained. 
For each group those items on which 
they indicate large percentages of 
plus and minus responses can be as- 
signed to the different dimensions of 
the test, as indicated previously. 

(IIb). A similar pattern analysis 
can also be done for inconsistent re- 





: 


252 RoBERT GLASER 


sponses. In this method, the occur- 
rence and non-occurrence of inconsist- 
ent responses on different items would 
be compared. Those subjects who 
show high agreement in their patterns 
of occurrence of inconsistent responses 
could be considered as belonging to 
the same class. 


SUMMARY 


This study has been concerned with 
an analysis of the method of multiple 
operation measurement in  psycho- 
logical testing. 


(1) In the course of this analysis 
a theoretical model has been presented 
which postulates the following re- 
sponse properties of this kind of meas- 
urement: 

(a) On certain uni-dimensional 
tests, when the test items are ordered 
in terms of their scale position there 
is exhibited an approximately normal 
distribution of inconsistent responses 
in the zone of transition from plus to 
minus responses. 

(b) In a uni-dimensional test a test 
score may be considered as an estimate 
of the maximum point of this dis- 
tribution of inconsistent responses 
and should correspond to the modal or 
mean point of this distribution. 

(2) It has been possible on the 
basis of this model to postulate some 
expected relationships between test 
score and the amount of inconsistency 
of response to test items: 

(a) When the range of test items is 
adequate for a group being tested, 
there should be a zero correlation be- 
tween inconsistency and test score. 

(b) When the range of a test is in- 
adequate or cut off for a group at both 
ends of the scale of measurement, a 
near zero product-moment correlation 
and a correlation ratio greater than 
zero between inconsistency and test 
score should be expected. 


(c) When the test range is cut off 
at the upper end of a scale there 
should be a negative correlation be- 
tween test score and inconsistency. 

‘4) When the test range is cut off 
at tiie lower end a positive correlation 
between test score and inconsistency 
should be obtained. 

(3) It has been possible from a pro- 
posed model of the response proper- 
ties of multiple operation measure- 
ment with two-dimensional tests, 
theoretically to apply these response 
properties to the problem of identify- 
ing the dimension on a test to which 
certain items belong. 


A second article will report the re- 
sults of an attempt to verify and 
apply empirically the postulates and 
techniques which have been presented. 
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